test

author: Patrick Simianer <p@simianer.de> 2011-10-14 15:42:05 +0200
committer: Patrick Simianer <p@simianer.de> 2011-10-14 15:42:05 +0200
commit: 422a99631a08b04b12233205e4373ffc20c06600 (patch)
tree: bd9331cde1e8c238316d6282a69092e0c8503f87 /dtrain
parent: 0b091f3f3f792cc6cbe26e68568aeced79d50064 (diff)
1 files changed, 28 insertions, 26 deletions
diff --git a/dtrain/README.md b/dtrain/README.md
index dc980faf..66168b6a 100644
--- a/dtrain/README.md
+++ b/dtrain/README.md
@@ -1,38 +1,40 @@
+dtrain
+======
+
 IDEAS
-=====
- MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score)
- what about RESCORING?
- REMEMBER kbest (merge) weights?
- SELECT iteration with highest (real) BLEU?
- GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1)
- CACHING (ngrams for scoring)
- hadoop PIPES imlementation
- SHARED LM (kenlm actually does this!)?
- ITERATION variants
-  once -> average
-  shuffle resulting weights
- weights AVERAGING in reducer (global Ngram counts)
- BATCH implementation (no update after each Kbest list)
- set REFERENCE for cdec (rescoring)?
- MORE THAN ONE reference for BLEU?
- kbest NICER (do not iterate twice)!? -> shared_ptr?
- DO NOT USE Decoder::Decode (input caching as WordID)!?
-  sparse vector instead of vector<double> for weights in Decoder(::SetWeights)?
- reactivate DTEST and tests
- non deterministic, high variance, RANDOM RESTARTS
- use separate TEST SET
+-----
+* MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score)
+* what about RESCORING?
+* REMEMBER kbest (merge) weights?
+* SELECT iteration with highest (real) BLEU?
+* GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1)
+* CACHING (ngrams for scoring)
+* hadoop PIPES imlementation
+* SHARED LM (kenlm actually does this!)?
+* ITERATION variants
+ * once -> average
+ * shuffle resulting weights
+* weights AVERAGING in reducer (global Ngram counts)
+* BATCH implementation (no update after each Kbest list)
+* set REFERENCE for cdec (rescoring)?
+* MORE THAN ONE reference for BLEU?
+* kbest NICER (do not iterate twice)!? -> shared_ptr?
+* DO NOT USE Decoder::Decode (input caching as WordID)!?
+*  sparse vector instead of vector<double> for weights in Decoder(::SetWeights)?
+* reactivate DTEST and tests
+* non deterministic, high variance, RANDOM RESTARTS
+* use separate TEST SET
 
 Uncertain, known bugs, problems
-===============================
+-------------------------------
 * cdec kbest vs 1best (no -k param), rescoring? => ok(?)
 * no sparse vector in decoder => ok/fixed
 * PhraseModel_* features (0..99 seem to be generated, why 99?)
 * flex scanner jams on malicious input, we could skip that
 
 FIXME
-=====
-* merge
-* ep data
+-----
+* merge with cdec master
 
 Data
 ====
author	Patrick Simianer <p@simianer.de>	2011-10-14 15:42:05 +0200
committer	Patrick Simianer <p@simianer.de>	2011-10-14 15:42:05 +0200
commit	422a99631a08b04b12233205e4373ffc20c06600 (patch)
tree	bd9331cde1e8c238316d6282a69092e0c8503f87 /dtrain
parent	0b091f3f3f792cc6cbe26e68568aeced79d50064 (diff)