summaryrefslogtreecommitdiff
path: root/dtrain/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'dtrain/README.md')
-rw-r--r--dtrain/README.md34
1 files changed, 12 insertions, 22 deletions
diff --git a/dtrain/README.md b/dtrain/README.md
index 71641bd8..d6699cb4 100644
--- a/dtrain/README.md
+++ b/dtrain/README.md
@@ -3,34 +3,24 @@ dtrain
Ideas
-----
-* *MULTIPARTITE* ranking (108010, 1 vs all, cluster modelscore;score)
-* what about RESCORING?
-* REMEMBER kbest (merge) weights?
-* SELECT iteration with highest (real) BLEU?
-* GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1)
-* CACHING (ngrams for scoring)
-* hadoop PIPES imlementation
-* SHARED LM (kenlm actually does this!)?
-* ITERATION variants
- * once -> average
- * shuffle resulting weights
-* weights AVERAGING in reducer (global Ngram counts)
-* BATCH implementation (no update after each Kbest list)
-* set REFERENCE for cdec (rescoring)?
-* MORE THAN ONE reference for BLEU?
-* kbest NICER (do not iterate twice)!? -> shared_ptr?
-* DO NOT USE Decoder::Decode (input caching as WordID)!?
-* sparse vector instead of vector<double> for weights in Decoder(::SetWeights)?
-* reactivate DTEST and tests
-* non deterministic, high variance, RANDOM RESTARTS
-* use separate TEST SET
+* *MULTIPARTITE* ranking (1 vs all, cluster model/score)
+* *REMEMBER* sampled translations (merge)
+* *SELECT* iteration with highest (_real_) BLEU?
+* *GENERATED* data? (perfect translation in kbest)
+* *CACHING* (ngrams for scoring)
+* hadoop *PIPES* imlementation
+* *ITERATION* variants (shuffle resulting weights, re-iterate)
+* *MORE THAN ONE* reference for BLEU?
+* *RANDOM RESTARTS*
+* use separate TEST SET for each shard
Uncertain, known bugs, problems
-------------------------------
-* cdec kbest vs 1best (no -k param), rescoring? => ok(?)
+* cdec kbest vs 1best (no -k param), rescoring (ref?)? => ok(?)
* no sparse vector in decoder => ok/fixed
* PhraseModel_* features (0..99 seem to be generated, why 99?)
* flex scanner jams on malicious input, we could skip that
+* input/grammar caching (strings, files)
FIXME
-----