summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--dtrain/README.md54
1 files changed, 28 insertions, 26 deletions
diff --git a/dtrain/README.md b/dtrain/README.md
index dc980faf..66168b6a 100644
--- a/dtrain/README.md
+++ b/dtrain/README.md
@@ -1,38 +1,40 @@
+dtrain
+======
+
IDEAS
-=====
- MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score)
- what about RESCORING?
- REMEMBER kbest (merge) weights?
- SELECT iteration with highest (real) BLEU?
- GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1)
- CACHING (ngrams for scoring)
- hadoop PIPES imlementation
- SHARED LM (kenlm actually does this!)?
- ITERATION variants
- once -> average
- shuffle resulting weights
- weights AVERAGING in reducer (global Ngram counts)
- BATCH implementation (no update after each Kbest list)
- set REFERENCE for cdec (rescoring)?
- MORE THAN ONE reference for BLEU?
- kbest NICER (do not iterate twice)!? -> shared_ptr?
- DO NOT USE Decoder::Decode (input caching as WordID)!?
- sparse vector instead of vector<double> for weights in Decoder(::SetWeights)?
- reactivate DTEST and tests
- non deterministic, high variance, RANDOM RESTARTS
- use separate TEST SET
+-----
+* MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score)
+* what about RESCORING?
+* REMEMBER kbest (merge) weights?
+* SELECT iteration with highest (real) BLEU?
+* GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1)
+* CACHING (ngrams for scoring)
+* hadoop PIPES imlementation
+* SHARED LM (kenlm actually does this!)?
+* ITERATION variants
+ * once -> average
+ * shuffle resulting weights
+* weights AVERAGING in reducer (global Ngram counts)
+* BATCH implementation (no update after each Kbest list)
+* set REFERENCE for cdec (rescoring)?
+* MORE THAN ONE reference for BLEU?
+* kbest NICER (do not iterate twice)!? -> shared_ptr?
+* DO NOT USE Decoder::Decode (input caching as WordID)!?
+* sparse vector instead of vector<double> for weights in Decoder(::SetWeights)?
+* reactivate DTEST and tests
+* non deterministic, high variance, RANDOM RESTARTS
+* use separate TEST SET
Uncertain, known bugs, problems
-===============================
+-------------------------------
* cdec kbest vs 1best (no -k param), rescoring? => ok(?)
* no sparse vector in decoder => ok/fixed
* PhraseModel_* features (0..99 seem to be generated, why 99?)
* flex scanner jams on malicious input, we could skip that
FIXME
-=====
-* merge
-* ep data
+-----
+* merge with cdec master
Data
====