diff options
Diffstat (limited to 'dtrain/README.md')
-rw-r--r-- | dtrain/README.md | 29 |
1 files changed, 18 insertions, 11 deletions
diff --git a/dtrain/README.md b/dtrain/README.md index f4e1abed..2a24ec22 100644 --- a/dtrain/README.md +++ b/dtrain/README.md @@ -1,13 +1,20 @@ -This is a simple (but parallelizable) tuning method for cdec, as used here: +This is a simple (and parallelizable) tuning method for cdec +which is able to train the weights of very many (sparse) features. +It was used here: "Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in - SMT" Simianer, Riezler, Dyer - ACL 2012 + SMT" Simianer, Riezler, Dyer; ACL 2012 Building -------- -builds when building cdec, see ../BUILDING +Builds when building cdec, see ../BUILDING . +To build only parts needed for dtrain do +``` + autoreconf -ifv + ./configure [--disable-test] + cd dtrain/; make +``` Running ------- @@ -15,10 +22,10 @@ To run this on a dev set locally: ``` #define DTRAIN_LOCAL ``` -otherwise remove that line or undef. You need a single grammar file -or per-sentence-grammars (psg) as you would use with cdec. -Additionally you need to give dtrain a file with -references (--refs). +otherwise remove that line or undef, then recompile. You need a single +grammar file or input annotated with per-sentence grammars (psg) as you +would use with cdec. Additionally you need to give dtrain a file with +references (--refs) when running locally. The input for use with hadoop streaming looks like this: ``` @@ -27,12 +34,12 @@ The input for use with hadoop streaming looks like this: To convert a psg to this format you need to replace all "\n" by "\t". Make sure there are no tabs in your data. -For an example of local usage (with 'distributed' format) +For an example of local usage (with the 'distributed' format) the see test/example/ . This expects dtrain to be built without DTRAIN_LOCAL. -Legal stuff ------------ +Legal +----- Copyright (c) 2012 by Patrick Simianer <p@simianer.de> See the file ../LICENSE.txt for the licensing terms that this software is |