From cbd44f3a67d6506595c458433d2855ac5716e507 Mon Sep 17 00:00:00 2001 From: Jacob Date: Sun, 28 Jul 2013 10:57:18 +0100 Subject: full readme --- README.md | 45 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 98d9c44..4446d1a 100644 --- a/README.md +++ b/README.md @@ -12,4 +12,47 @@ For a description of the system (it's really not complicated), see: - J Andreas, A Vlachos and S Clark. "Semantic Parsing as Machine Translation". To appear in ACL-SHORT 2013. -### How do I +### Getting started + +Edit `dependencies.yaml` to reflect the configuration of your system. +`smt_semparse` should be set to the location of the repository root, the +`moses`, `srilm`, etc. entries to the roots of the corresponding external +dependencies, and `srilm_arch` to your machine architecture. + +### Reproducing the ACL13 paper + +Edit settings.yaml to choose a language and translation model for the particular +experiment you want to run. Use the following additional settings: + +lang=en -> stem=true, symm=srctotgt +lang=de -> stem=true, symm=tgttosrc +lang=el -> stem=false, symm=tgttosrc +lang=th -> stem=false, symm=tgttosrc + +Note that due to random MERT initialization your exact accuracy and F1 values +may differ slightly from those in the paper. + +### Experimental things + +Additional settings also allow you to do the following: + +- Rebuild the phrase table after running MERT to squeeze a few more translation + rules out of the training data. (Should give a nearly-imperceptible + improvement in accuracy.) + +- Filter rules which correspond to multi-rooted forests from the phrase table. + (Should decrease accuracy.) + +- Do full-supervised training on only a fraction of the dataset, and use the + remaining monolingual data to reweight rules. (Mostly garbage---this data set + is already too small to permit experiments which require holding out even more + data.) + +### Not implemented + +MRL-to-NL à la Lu & Ng 2011. + +### Using a new dataset + +Update `extractor.py` to create appropriately-formatted files in the working +directory. See the existing GeoQuery extractor for an example. -- cgit v1.2.3