diff options
Diffstat (limited to 'dtrain/README.md')
-rw-r--r-- | dtrain/README.md | 40 |
1 files changed, 40 insertions, 0 deletions
diff --git a/dtrain/README.md b/dtrain/README.md new file mode 100644 index 00000000..c39d94d2 --- /dev/null +++ b/dtrain/README.md @@ -0,0 +1,40 @@ +This is a really fast (parallelizable) tuning method for cdec as used here: + "Joint Feature Selection in Distributed Stochastic + Learning for Large-Scale Discriminative Training in + SMT" Simianer, Riezler, Dyer + ACL 2012 + + +Building +-------- +builds when building cdec, see ../BUILDING + +Running +------- +To run this on a dev set locally (default): +<code> +#define DTRAIN_LOCAL +</code> +otherwise remove that line or undef. You need a single grammar file +or per-sentence-grammars (psg) as you would use with cdec. +Additionally you need to give dtrain a file with +references (--refs). + +The input for use with hadoop streaming looks like this: +<code> +<id>\t<source>\t<ref>\t<grammar rules separated by tab> +</code> +To convert a psg to this format you need to replace all "\n" +by "\t". Make sure there are no tabs in your data. + +For an example of local usage (with 'distributed' format) +the see test/example/ . This expects dtrain to be built without +DTRAIN_LOCAL param. + +Legal stuff +----------- +Copyright (c) 2012 by Patrick Simianer <p@simianer.de> + +See the file ../LICENSE.txt for the licensing terms that this software is +released under. + |