full readme

author: Jacob <andqso@gmail.com> 2013-07-28 10:57:18 +0100
committer: Jacob <andqso@gmail.com> 2013-07-28 10:57:18 +0100
commit: cbd44f3a67d6506595c458433d2855ac5716e507 (patch)
tree: 3ffa4a72ca028ad4a75b2fca2defdcb987fbb8a0
parent: 816e1e26ccd67698459678ac00f793144d25e06e (diff)
1 files changed, 44 insertions, 1 deletions
diff --git a/README.md b/README.md
index 98d9c44..4446d1a 100644
--- a/README.md
+++ b/README.md
@@ -12,4 +12,47 @@ For a description of the system (it's really not complicated), see:
 - J Andreas, A Vlachos and S Clark. "Semantic Parsing as Machine
   Translation". To appear in ACL-SHORT 2013.
 
-### How do I 
+### Getting started
+
+Edit `dependencies.yaml` to reflect the configuration of your system.
+`smt_semparse` should be set to the location of the repository root, the
+`moses`, `srilm`, etc. entries to the roots of the corresponding external
+dependencies, and `srilm_arch` to your machine architecture.
+
+### Reproducing the ACL13 paper
+
+Edit settings.yaml to choose a language and translation model for the particular
+experiment you want to run. Use the following additional settings:
+
+lang=en -> stem=true,  symm=srctotgt
+lang=de -> stem=true,  symm=tgttosrc
+lang=el -> stem=false, symm=tgttosrc
+lang=th -> stem=false, symm=tgttosrc
+
+Note that due to random MERT initialization your exact accuracy and F1 values
+may differ slightly from those in the paper.
+
+### Experimental things
+
+Additional settings also allow you to do the following:
+
+- Rebuild the phrase table after running MERT to squeeze a few more translation
+  rules out of the training data. (Should give a nearly-imperceptible
+  improvement in accuracy.)
+
+- Filter rules which correspond to multi-rooted forests from the phrase table.
+  (Should decrease accuracy.)
+
+- Do full-supervised training on only a fraction of the dataset, and use the
+  remaining monolingual data to reweight rules. (Mostly garbage---this data set
+  is already too small to permit experiments which require holding out even more
+  data.)
+
+### Not implemented
+
+MRL-to-NL &agrave; la Lu &amp; Ng 2011.
+
+### Using a new dataset
+
+Update `extractor.py` to create appropriately-formatted files in the working
+directory. See the existing GeoQuery extractor for an example.
author	Jacob <andqso@gmail.com>	2013-07-28 10:57:18 +0100
committer	Jacob <andqso@gmail.com>	2013-07-28 10:57:18 +0100
commit	cbd44f3a67d6506595c458433d2855ac5716e507 (patch)
tree	3ffa4a72ca028ad4a75b2fca2defdcb987fbb8a0
parent	816e1e26ccd67698459678ac00f793144d25e06e (diff)