blob: f43e58cc3996e7122fe14dbd31418bfe487644df (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
|
SUFFIX-ARRAY-EXTRACT README
Feb 1, 2012
Written by Adam Lopez, repackaged by Chris Dyer.
Originally based on parts of Hiero, by David Chiang, but these dependencies
have been removed or rewritten.
BUILD INSTRUCTIONS
==============================================================================
Requirements:
Python 2.7 or later (http://www.python.org)
Cython 0.14.1 or later (http://cython.org/)
- Edit Makefile to set the location of Python/Cython then do:
make
COMPILING A PARALLEL CORPUS AND WORD ALIGNMENT
==============================================================================
- Run sa-compile.pl to compile the training data and generate an extract.ini
file (which is written to STDOUT):
sa-compile.pl -b bitext_name=source.fr,target.en \
-a alignment_name=alignment.txt > extract.ini
EXTRACTION OF PER-SENTENCE GRAMMARS
==============================================================================
- Example:
cat test.fr | extractor.py -c extract.ini
EXTRACTION OF COMPLETE TEST-SET GRAMMARS
==============================================================================
Edit the generated extract.ini file a change per_sentence_grammar
to False. Then, run extraction as normal.
Note: extracting a single grammar for an entire test set will consume more
memory during extraction and (probably) during decoding.
EXAMPLE
==============================================================================
- See example/ and the README therein.
|