summaryrefslogtreecommitdiff
path: root/extractor
diff options
context:
space:
mode:
authorPatrick Simianer <p@simianer.de>2014-10-13 19:03:48 +0100
committerPatrick Simianer <p@simianer.de>2014-10-13 19:03:48 +0100
commitcb9fb7088dde35881516c088db402abe747d49fa (patch)
treea91e4935a7941f1b261f76d88ab41fa3078a1891 /extractor
parent0a00e57e921c8eca8e02364db7d2e6607bfdcebc (diff)
parentb1ed81ef3216b212295afa76c5d20a56fb647204 (diff)
Merge remote-tracking branch 'upstream/master'
Diffstat (limited to 'extractor')
-rw-r--r--extractor/README.md4
-rw-r--r--extractor/sacompile.cc1
2 files changed, 3 insertions, 2 deletions
diff --git a/extractor/README.md b/extractor/README.md
index 642fbd1d..b83ff900 100644
--- a/extractor/README.md
+++ b/extractor/README.md
@@ -1,10 +1,10 @@
-C++ implementation of the online grammar extractor originally developed by [Adam Lopez](http://www.cs.jhu.edu/~alopez/).
+A simple and fast C++ implementation of a SCFG grammar extractor using suffix arrays. The implementation is described in this [paper](https://ufal.mff.cuni.cz/pbml/102/art-baltescu-blunsom.pdf). The original cython extractor is described in [Adam Lopez](http://www.cs.jhu.edu/~alopez/)'s PhD [thesis](http://www.cs.jhu.edu/~alopez/papers/adam.lopez.dissertation.pdf).
The grammar extraction takes place in two steps: (a) precomputing a number of data structures and (b) actually extracting the grammars. All the flags below have the same meaning as in the cython implementation.
To compile the data structures you need to run:
- cdec/extractor/compile -a <alignment> -b <parallel_corpus> -c <compile_config_file> -o <compile_directory>
+ cdec/extractor/sacompile -a <alignment> -b <parallel_corpus> -c <compile_config_file> -o <compile_directory>
To extract the grammars you need to run:
diff --git a/extractor/sacompile.cc b/extractor/sacompile.cc
index 3ee668ce..d80ab64d 100644
--- a/extractor/sacompile.cc
+++ b/extractor/sacompile.cc
@@ -114,6 +114,7 @@ int main(int argc, char** argv) {
stop_write = Clock::now();
write_duration += GetDuration(start_write, stop_write);
+ stop_time = Clock::now();
cerr << "Constructing suffix array took "
<< GetDuration(start_time, stop_time) << " seconds" << endl;