summaryrefslogtreecommitdiff
path: root/extractor
diff options
context:
space:
mode:
authorWu, Ke <wuke@cs.umd.edu>2014-12-17 16:11:38 -0500
committerWu, Ke <wuke@cs.umd.edu>2014-12-17 16:11:38 -0500
commit1613f1fc44ca67820afd7e7b21eb54b316c8ce55 (patch)
treee02b77084f28a18df6b854f87a986124db44d717 /extractor
parentbd9308e22b5434aa220cc57d82ee867464a011f1 (diff)
parent796768086a687d3f1856fef6489c34fe4d373642 (diff)
Merge with upstream
Diffstat (limited to 'extractor')
-rw-r--r--extractor/README.md4
-rw-r--r--extractor/sacompile.cc1
2 files changed, 3 insertions, 2 deletions
diff --git a/extractor/README.md b/extractor/README.md
index 642fbd1d..b83ff900 100644
--- a/extractor/README.md
+++ b/extractor/README.md
@@ -1,10 +1,10 @@
-C++ implementation of the online grammar extractor originally developed by [Adam Lopez](http://www.cs.jhu.edu/~alopez/).
+A simple and fast C++ implementation of a SCFG grammar extractor using suffix arrays. The implementation is described in this [paper](https://ufal.mff.cuni.cz/pbml/102/art-baltescu-blunsom.pdf). The original cython extractor is described in [Adam Lopez](http://www.cs.jhu.edu/~alopez/)'s PhD [thesis](http://www.cs.jhu.edu/~alopez/papers/adam.lopez.dissertation.pdf).
The grammar extraction takes place in two steps: (a) precomputing a number of data structures and (b) actually extracting the grammars. All the flags below have the same meaning as in the cython implementation.
To compile the data structures you need to run:
- cdec/extractor/compile -a <alignment> -b <parallel_corpus> -c <compile_config_file> -o <compile_directory>
+ cdec/extractor/sacompile -a <alignment> -b <parallel_corpus> -c <compile_config_file> -o <compile_directory>
To extract the grammars you need to run:
diff --git a/extractor/sacompile.cc b/extractor/sacompile.cc
index 3ee668ce..d80ab64d 100644
--- a/extractor/sacompile.cc
+++ b/extractor/sacompile.cc
@@ -114,6 +114,7 @@ int main(int argc, char** argv) {
stop_write = Clock::now();
write_duration += GetDuration(start_write, stop_write);
+ stop_time = Clock::now();
cerr << "Constructing suffix array took "
<< GetDuration(start_time, stop_time) << " seconds" << endl;