summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJacob <andqso@gmail.com>2013-07-28 10:57:18 +0100
committerJacob <andqso@gmail.com>2013-07-28 10:57:18 +0100
commitcbd44f3a67d6506595c458433d2855ac5716e507 (patch)
tree3ffa4a72ca028ad4a75b2fca2defdcb987fbb8a0
parent816e1e26ccd67698459678ac00f793144d25e06e (diff)
full readme
-rw-r--r--README.md45
1 files changed, 44 insertions, 1 deletions
diff --git a/README.md b/README.md
index 98d9c44..4446d1a 100644
--- a/README.md
+++ b/README.md
@@ -12,4 +12,47 @@ For a description of the system (it's really not complicated), see:
- J Andreas, A Vlachos and S Clark. "Semantic Parsing as Machine
Translation". To appear in ACL-SHORT 2013.
-### How do I
+### Getting started
+
+Edit `dependencies.yaml` to reflect the configuration of your system.
+`smt_semparse` should be set to the location of the repository root, the
+`moses`, `srilm`, etc. entries to the roots of the corresponding external
+dependencies, and `srilm_arch` to your machine architecture.
+
+### Reproducing the ACL13 paper
+
+Edit settings.yaml to choose a language and translation model for the particular
+experiment you want to run. Use the following additional settings:
+
+lang=en -> stem=true, symm=srctotgt
+lang=de -> stem=true, symm=tgttosrc
+lang=el -> stem=false, symm=tgttosrc
+lang=th -> stem=false, symm=tgttosrc
+
+Note that due to random MERT initialization your exact accuracy and F1 values
+may differ slightly from those in the paper.
+
+### Experimental things
+
+Additional settings also allow you to do the following:
+
+- Rebuild the phrase table after running MERT to squeeze a few more translation
+ rules out of the training data. (Should give a nearly-imperceptible
+ improvement in accuracy.)
+
+- Filter rules which correspond to multi-rooted forests from the phrase table.
+ (Should decrease accuracy.)
+
+- Do full-supervised training on only a fraction of the dataset, and use the
+ remaining monolingual data to reweight rules. (Mostly garbage---this data set
+ is already too small to permit experiments which require holding out even more
+ data.)
+
+### Not implemented
+
+MRL-to-NL &agrave; la Lu &amp; Ng 2011.
+
+### Using a new dataset
+
+Update `extractor.py` to create appropriately-formatted files in the working
+directory. See the existing GeoQuery extractor for an example.