summaryrefslogtreecommitdiff
path: root/extractor
diff options
context:
space:
mode:
authorWu, Ke <wuke@cs.umd.edu>2014-12-17 16:11:38 -0500
committerWu, Ke <wuke@cs.umd.edu>2014-12-17 16:11:38 -0500
commit7468e8d85e99b4619442c7afaf4a0d92870111bb (patch)
treea6f17da7c69048c8900260b5490bb9d8611be3bb /extractor
parentb6dd5a683db9dda2d634dd2fdb76606819594901 (diff)
parent1a79175f9a101d46cf27ca921213d5dd9300518f (diff)
Merge with upstream
Diffstat (limited to 'extractor')
-rw-r--r--extractor/README.md4
-rw-r--r--extractor/sacompile.cc1
2 files changed, 3 insertions, 2 deletions
diff --git a/extractor/README.md b/extractor/README.md
index 642fbd1d..b83ff900 100644
--- a/extractor/README.md
+++ b/extractor/README.md
@@ -1,10 +1,10 @@
-C++ implementation of the online grammar extractor originally developed by [Adam Lopez](http://www.cs.jhu.edu/~alopez/).
+A simple and fast C++ implementation of a SCFG grammar extractor using suffix arrays. The implementation is described in this [paper](https://ufal.mff.cuni.cz/pbml/102/art-baltescu-blunsom.pdf). The original cython extractor is described in [Adam Lopez](http://www.cs.jhu.edu/~alopez/)'s PhD [thesis](http://www.cs.jhu.edu/~alopez/papers/adam.lopez.dissertation.pdf).
The grammar extraction takes place in two steps: (a) precomputing a number of data structures and (b) actually extracting the grammars. All the flags below have the same meaning as in the cython implementation.
To compile the data structures you need to run:
- cdec/extractor/compile -a <alignment> -b <parallel_corpus> -c <compile_config_file> -o <compile_directory>
+ cdec/extractor/sacompile -a <alignment> -b <parallel_corpus> -c <compile_config_file> -o <compile_directory>
To extract the grammars you need to run:
diff --git a/extractor/sacompile.cc b/extractor/sacompile.cc
index 3ee668ce..d80ab64d 100644
--- a/extractor/sacompile.cc
+++ b/extractor/sacompile.cc
@@ -114,6 +114,7 @@ int main(int argc, char** argv) {
stop_write = Clock::now();
write_duration += GetDuration(start_write, stop_write);
+ stop_time = Clock::now();
cerr << "Constructing suffix array took "
<< GetDuration(start_time, stop_time) << " seconds" << endl;