summaryrefslogtreecommitdiff
path: root/python/README.md
diff options
context:
space:
mode:
authorVictor Chahuneau <vchahune@cs.cmu.edu>2012-07-27 01:16:03 -0400
committerVictor Chahuneau <vchahune@cs.cmu.edu>2012-07-27 01:16:03 -0400
commitb2a8bccb2bd713d9ec081cf3dad0162c2cb492d8 (patch)
treec661044fd2a3943cf2ad12109b916fd7b56a519e /python/README.md
parent148b1168c2b07abf0c7757a31141377c28ec3d91 (diff)
[python] Fork of the suffix-array extractor with surface improvements
Available as the cdec.sa module, with commande-line helpers: python -m cdec.sa.compile -f ... -e ... -a ... -o sa-out/ -c extract.ini python -m cdec.sa.extract -c extract.ini -g grammars-out/ < input.txt > input.sgml + renamed cdec.scfg -> cdec.sa + Python README
Diffstat (limited to 'python/README.md')
-rw-r--r--python/README.md33
1 files changed, 33 insertions, 0 deletions
diff --git a/python/README.md b/python/README.md
new file mode 100644
index 00000000..1ddb61a9
--- /dev/null
+++ b/python/README.md
@@ -0,0 +1,33 @@
+pycdec is a Python interface to cdec
+
+## Installation
+
+pycdec depends on the configobj module:
+
+ pip install configobj
+
+Build and install pycdec:
+
+ python setup.py install
+
+## Grammar extractor
+
+Compile a parallel corpus and a word alignment into a suffix array representation:
+
+ python -m cdec.sa.compile -f f.txt -e e.txt -a a.txt -o output/ -c extract.ini
+
+Extract grammar rules from the compiled corpus:
+
+ cat input.txt | python -m cdec.sa.extract -c extract.ini -g grammars/
+
+This will create per-sentence grammar files in the `grammars` directory and output annotated input suitable for translation with cdec.
+
+## Library usage
+
+A basic demo of pycdec's features is available in `test.py`
+
+More documentation will come as the API becomes stable.
+
+---
+
+pycdec was contributed by [Victor Chahuneau](http://victor.chahuneau.fr) \ No newline at end of file