diff options
author | Victor Chahuneau <vchahune@cs.cmu.edu> | 2012-07-27 01:16:03 -0400 |
---|---|---|
committer | Victor Chahuneau <vchahune@cs.cmu.edu> | 2012-07-27 01:16:03 -0400 |
commit | b2a8bccb2bd713d9ec081cf3dad0162c2cb492d8 (patch) | |
tree | c661044fd2a3943cf2ad12109b916fd7b56a519e /python/README.md | |
parent | 148b1168c2b07abf0c7757a31141377c28ec3d91 (diff) |
[python] Fork of the suffix-array extractor with surface improvements
Available as the cdec.sa module, with commande-line helpers:
python -m cdec.sa.compile -f ... -e ... -a ... -o sa-out/ -c extract.ini
python -m cdec.sa.extract -c extract.ini -g grammars-out/ < input.txt > input.sgml
+ renamed cdec.scfg -> cdec.sa
+ Python README
Diffstat (limited to 'python/README.md')
-rw-r--r-- | python/README.md | 33 |
1 files changed, 33 insertions, 0 deletions
diff --git a/python/README.md b/python/README.md new file mode 100644 index 00000000..1ddb61a9 --- /dev/null +++ b/python/README.md @@ -0,0 +1,33 @@ +pycdec is a Python interface to cdec + +## Installation + +pycdec depends on the configobj module: + + pip install configobj + +Build and install pycdec: + + python setup.py install + +## Grammar extractor + +Compile a parallel corpus and a word alignment into a suffix array representation: + + python -m cdec.sa.compile -f f.txt -e e.txt -a a.txt -o output/ -c extract.ini + +Extract grammar rules from the compiled corpus: + + cat input.txt | python -m cdec.sa.extract -c extract.ini -g grammars/ + +This will create per-sentence grammar files in the `grammars` directory and output annotated input suitable for translation with cdec. + +## Library usage + +A basic demo of pycdec's features is available in `test.py` + +More documentation will come as the API becomes stable. + +--- + +pycdec was contributed by [Victor Chahuneau](http://victor.chahuneau.fr)
\ No newline at end of file |