1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
|
cdec is a fast decoder.
SPEED COMPARISON
------------------------------------------------------------------------------
Here is a comparison with a couple of other decoders:
Decoder Lang. BLEU Run-Time Memory
cdec c++ 31.47 0.37 sec/sent 1.0-1.1GB
Joshua Java 31.55 2.34 sec/sent 4.0-4.8GB
Hiero Python 31.22 27.2 sec/sent 1.7-1.9GB
The maximum number of pops from candidate heap at each node is k=30, no other
pruning, 3gm LM, Chinese-English translation task.
GETTING STARTED
------------------------------------------------------------------------------
See the BUILDING file for instructions on how to build the software. To
explore the decoder's features, the best way to get started is to look
at cdec's command line options or to have a look at the test cases in
the tests/system_tests/ directory. Each of these can be run with a command
like ./cdec -c cdec.ini -i input.txt -w weights . The files should be
self explanatory.
EXTRACTING A SYNCHRONOUS GRAMMAR / PHRASE TABLE
------------------------------------------------------------------------------
cdec does not include code for generating grammars. To build these, you will
need to write your own software or use an existing package like Joshua, Hiero,
or Moses.
OPTIMIZING / TRAINING MODELS
------------------------------------------------------------------------------
cdec does include code for optimizing models, according to a number of
training criteria, including training models as CRFs (with latent derivation
variables), MERT (over hypergraphs) to opimize BLEU, TER, etc.
Eventually, I will provide documentation for this.
ALIGNMENT / SYNCHRONOUS PARSING / CONSTRAINED DECODING
------------------------------------------------------------------------------
cdec can be used as an aligner. For examples, see the test cases.
COPYRIGHT AND LICENSE
------------------------------------------------------------------------------
Copyright (c) 2009 by Chris Dyer <redpony@gmail.com>
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
The LBFGS implementation contains code from the Computational
Crystallography Toolbox which is copyright (c) 2006 by The Regents of the
University of California, through Lawrence Berkeley National Laboratory.
For more information on their license, refer to http://cctbx.sourceforge.net/
|