summaryrefslogtreecommitdiff
path: root/dtrain/README
diff options
context:
space:
mode:
authorPatrick Simianer <p@simianer.de>2011-10-14 15:40:23 +0200
committerPatrick Simianer <p@simianer.de>2011-10-14 15:40:23 +0200
commit628c4ecb641096c6526c7e6062460e627433f8fa (patch)
tree5cacd40d1bf27a18799eb2ab4c8b95cbfc986e58 /dtrain/README
parent73c406958feb6598382aee3b3043195e506a56f8 (diff)
test
Diffstat (limited to 'dtrain/README')
-rw-r--r--dtrain/README36
1 files changed, 0 insertions, 36 deletions
diff --git a/dtrain/README b/dtrain/README
deleted file mode 100644
index 997c5ff3..00000000
--- a/dtrain/README
+++ /dev/null
@@ -1,36 +0,0 @@
-TODO
- MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score)
- what about RESCORING?
- REMEMBER kbest (merge) weights?
- SELECT iteration with highest (real) BLEU?
- GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1)
- CACHING (ngrams for scoring)
- hadoop PIPES imlementation
- SHARED LM (kenlm actually does this!)?
- ITERATION variants
- once -> average
- shuffle resulting weights
- weights AVERAGING in reducer (global Ngram counts)
- BATCH implementation (no update after each Kbest list)
- set REFERENCE for cdec (rescoring)?
- MORE THAN ONE reference for BLEU?
- kbest NICER (do not iterate twice)!? -> shared_ptr?
- DO NOT USE Decoder::Decode (input caching as WordID)!?
- sparse vector instead of vector<double> for weights in Decoder(::SetWeights)?
- reactivate DTEST and tests
- non deterministic, high variance, RANDOM RESTARTS
- use separate TEST SET
-
-KNOWN BUGS, PROBLEMS
- doesn't select best iteration for weigts
- if size of candidate < N => 0 score
- cdec kbest vs 1best (no -k param), rescoring? => ok(?)
- no sparse vector in decoder => ok
- ? ok
- sh: error while loading shared libraries: libreadline.so.6: cannot open shared object file: Error 24
- PhraseModel_* features (0..99 seem to be generated, why 99?)
- flex scanner jams on malicious input, we could skip that
-
-FIX
- merge
- ep data