diff options
author | Chris Dyer <prguest11@taipan.cs> | 2012-02-02 06:29:50 +0000 |
---|---|---|
committer | Chris Dyer <prguest11@taipan.cs> | 2012-02-02 06:29:50 +0000 |
commit | 8e5fad9bcbadf36bbab3c1c5b053e3c8f7dddbce (patch) | |
tree | 9c812b3f267aa1975cdf8b7af928c4b20eb36f93 /sa-extract/README | |
parent | ff496d3089e84846c8562c574155d8df1e4d911c (diff) |
lopez suffix array extractor with copyrighted david chiang code excised
Diffstat (limited to 'sa-extract/README')
-rw-r--r-- | sa-extract/README | 50 |
1 files changed, 50 insertions, 0 deletions
diff --git a/sa-extract/README b/sa-extract/README new file mode 100644 index 00000000..f43e58cc --- /dev/null +++ b/sa-extract/README @@ -0,0 +1,50 @@ +SUFFIX-ARRAY-EXTRACT README + Feb 1, 2012 + +Written by Adam Lopez, repackaged by Chris Dyer. + +Originally based on parts of Hiero, by David Chiang, but these dependencies +have been removed or rewritten. + + +BUILD INSTRUCTIONS +============================================================================== + +Requirements: + Python 2.7 or later (http://www.python.org) + Cython 0.14.1 or later (http://cython.org/) + +- Edit Makefile to set the location of Python/Cython then do: + + make + + +COMPILING A PARALLEL CORPUS AND WORD ALIGNMENT +============================================================================== +- Run sa-compile.pl to compile the training data and generate an extract.ini + file (which is written to STDOUT): + + sa-compile.pl -b bitext_name=source.fr,target.en \ + -a alignment_name=alignment.txt > extract.ini + + +EXTRACTION OF PER-SENTENCE GRAMMARS +============================================================================== +- Example: + cat test.fr | extractor.py -c extract.ini + + +EXTRACTION OF COMPLETE TEST-SET GRAMMARS +============================================================================== +Edit the generated extract.ini file a change per_sentence_grammar +to False. Then, run extraction as normal. + +Note: extracting a single grammar for an entire test set will consume more +memory during extraction and (probably) during decoding. + + +EXAMPLE +============================================================================== +- See example/ and the README therein. + + |