summaryrefslogtreecommitdiff
path: root/dtrain/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'dtrain/README.md')
-rw-r--r--dtrain/README.md29
1 files changed, 18 insertions, 11 deletions
diff --git a/dtrain/README.md b/dtrain/README.md
index f4e1abed..2a24ec22 100644
--- a/dtrain/README.md
+++ b/dtrain/README.md
@@ -1,13 +1,20 @@
-This is a simple (but parallelizable) tuning method for cdec, as used here:
+This is a simple (and parallelizable) tuning method for cdec
+which is able to train the weights of very many (sparse) features.
+It was used here:
"Joint Feature Selection in Distributed Stochastic
Learning for Large-Scale Discriminative Training in
- SMT" Simianer, Riezler, Dyer
- ACL 2012
+ SMT" Simianer, Riezler, Dyer; ACL 2012
Building
--------
-builds when building cdec, see ../BUILDING
+Builds when building cdec, see ../BUILDING .
+To build only parts needed for dtrain do
+```
+ autoreconf -ifv
+ ./configure [--disable-test]
+ cd dtrain/; make
+```
Running
-------
@@ -15,10 +22,10 @@ To run this on a dev set locally:
```
#define DTRAIN_LOCAL
```
-otherwise remove that line or undef. You need a single grammar file
-or per-sentence-grammars (psg) as you would use with cdec.
-Additionally you need to give dtrain a file with
-references (--refs).
+otherwise remove that line or undef, then recompile. You need a single
+grammar file or input annotated with per-sentence grammars (psg) as you
+would use with cdec. Additionally you need to give dtrain a file with
+references (--refs) when running locally.
The input for use with hadoop streaming looks like this:
```
@@ -27,12 +34,12 @@ The input for use with hadoop streaming looks like this:
To convert a psg to this format you need to replace all "\n"
by "\t". Make sure there are no tabs in your data.
-For an example of local usage (with 'distributed' format)
+For an example of local usage (with the 'distributed' format)
the see test/example/ . This expects dtrain to be built without
DTRAIN_LOCAL.
-Legal stuff
------------
+Legal
+-----
Copyright (c) 2012 by Patrick Simianer <p@simianer.de>
See the file ../LICENSE.txt for the licensing terms that this software is