summaryrefslogtreecommitdiff
path: root/README.md
blob: 28e22691516e3b0e1f99bfaff2519ab09b13f4a9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Semantic parsing as machine translation

Most work on semantic parsing, even in variable-free formulations, has focused
on developing task- and formalism-specific models, often with expensive
training and decoding procedures. Can we use standard machine translation tools
to perform the same task? 

Yes. 

For a description of the system (it's really not complicated), see:

- J Andreas, A Vlachos and S Clark. "Semantic Parsing as Machine
  Translation". To appear in ACL-SHORT 2013.

### Getting started

Edit `dependencies.yaml` to reflect the configuration of your system.
`smt_semparse` should be set to the location of the repository root, the
`moses`, `srilm`, etc. entries to the roots of the corresponding external
dependencies, and `srilm_arch` to your machine architecture.

### Reproducing the ACL13 paper

Edit settings.yaml to choose a language and translation model for the particular
experiment you want to run. Use the following additional settings:

    lang=en -> stem=true,  symm=srctotgt
    lang=de -> stem=true,  symm=tgttosrc
    lang=el -> stem=false, symm=tgttosrc
    lang=th -> stem=false, symm=tgttosrc

Note that due to random MERT initialization your exact accuracy and F1 values
may differ slightly from those in the paper.

### Experimental things

Additional settings also allow you to do the following:

- Rebuild the phrase table after running MERT to squeeze a few more translation
  rules out of the training data. (Should give a nearly-imperceptible
  improvement in accuracy.)

- Filter rules which correspond to multi-rooted forests from the phrase table.
  (Should decrease accuracy.)

- Do full-supervised training on only a fraction of the dataset, and use the
  remaining monolingual data to reweight rules. (Mostly garbage---this data set
  is already too small to permit experiments which require holding out even more
  data.)

### Not implemented

MRL-to-NL à la Lu & Ng 2011.

### Using a new dataset

Update `extractor.py` to create appropriately-formatted files in the working
directory. See the existing GeoQuery extractor for an example.