Merge remote-tracking branch 'upstream/master'

author: Patrick Simianer <simianer@cl.uni-heidelberg.de> 2012-08-01 17:32:37 +0200
committer: Patrick Simianer <simianer@cl.uni-heidelberg.de> 2012-08-01 17:32:37 +0200
commit: eb3ea4fd5dff1c94b237af792c9f7bf421d79d96 (patch)
tree: 2acd7674f36e6dc6e815c5856519fdea1a2d6bf8 /sa-extract/README
parent: e816274e337a066df1b1e86ef00136a021a17caf (diff)
parent: 193d137056c3c4f73d66f8db84691d63307de894 (diff)
1 files changed, 0 insertions, 62 deletions
diff --git a/sa-extract/README b/sa-extract/README
deleted file mode 100644
index e4022c7e..00000000
--- a/sa-extract/README
+++ /dev/null
@@ -1,62 +0,0 @@
-SUFFIX-ARRAY-EXTRACT README
-  Feb 1, 2012
-
-Written by Adam Lopez, repackaged by Chris Dyer.
-
-Originally based on parts of Hiero, by David Chiang, but these dependencies
-have been removed or rewritten.
-
-
-BUILD INSTRUCTIONS
-==============================================================================
-
-Requirements:
-  Python 2.7 or later (http://www.python.org)
-  Cython 0.14.1 or later (http://cython.org/)
-
-- Edit Makefile to set the location of Python/Cython then do:
-
-  make
-
-
-COMPILING A PARALLEL CORPUS AND WORD ALIGNMENT
-==============================================================================
-- Run sa-compile.pl to compile the training data and generate an extract.ini
-  file (which is written to STDOUT):
-
-  sa-compile.pl -b bitext_name=source.fr,target.en \
-                -a alignment_name=alignment.txt > extract.ini
-
-
-  The training data should be in two parallel text files (source.fr,source.en)
-  and the alignments are expected in "0-0 1-2 2-1 ..." format produced by
-  most alignment toolkits. The text files should NOT be escaped for non-XML
-  characters.
-
-
-EXTRACTION OF PER-SENTENCE GRAMMARS
-==============================================================================
-The most common use-case we support is extraction of "per-sentence" grammars
-for each segment in a testset. You may run the extractor on test set, but it
-will try to interpret tags as SGML markup, so we provide a script that does
-escaping: ./escape-testset.pl.
-
-- Example:
-
-  cat test.fr | ./escape-testset.pl | ./extractor.py -c extract.ini
-
-
-EXTRACTION OF COMPLETE TEST-SET GRAMMARS
-==============================================================================
-Edit the generated extract.ini file a change per_sentence_grammar
-to False. Then, run extraction as normal.
-
-Note: extracting a single grammar for an entire test set will consume more
-memory during extraction and (probably) during decoding.
-
-
-EXAMPLE
-==============================================================================
-- See example/ and the README therein.
-
-
author	Patrick Simianer <simianer@cl.uni-heidelberg.de>	2012-08-01 17:32:37 +0200
committer	Patrick Simianer <simianer@cl.uni-heidelberg.de>	2012-08-01 17:32:37 +0200
commit	eb3ea4fd5dff1c94b237af792c9f7bf421d79d96 (patch)
tree	2acd7674f36e6dc6e815c5856519fdea1a2d6bf8 /sa-extract/README
parent	e816274e337a066df1b1e86ef00136a021a17caf (diff)
parent	193d137056c3c4f73d66f8db84691d63307de894 (diff)