diff options
author | Patrick Simianer <p@simianer.de> | 2011-10-14 15:40:23 +0200 |
---|---|---|
committer | Patrick Simianer <p@simianer.de> | 2011-10-14 15:40:23 +0200 |
commit | 628c4ecb641096c6526c7e6062460e627433f8fa (patch) | |
tree | 5cacd40d1bf27a18799eb2ab4c8b95cbfc986e58 /dtrain/README | |
parent | 73c406958feb6598382aee3b3043195e506a56f8 (diff) |
test
Diffstat (limited to 'dtrain/README')
-rw-r--r-- | dtrain/README | 36 |
1 files changed, 0 insertions, 36 deletions
diff --git a/dtrain/README b/dtrain/README deleted file mode 100644 index 997c5ff3..00000000 --- a/dtrain/README +++ /dev/null @@ -1,36 +0,0 @@ -TODO - MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score) - what about RESCORING? - REMEMBER kbest (merge) weights? - SELECT iteration with highest (real) BLEU? - GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1) - CACHING (ngrams for scoring) - hadoop PIPES imlementation - SHARED LM (kenlm actually does this!)? - ITERATION variants - once -> average - shuffle resulting weights - weights AVERAGING in reducer (global Ngram counts) - BATCH implementation (no update after each Kbest list) - set REFERENCE for cdec (rescoring)? - MORE THAN ONE reference for BLEU? - kbest NICER (do not iterate twice)!? -> shared_ptr? - DO NOT USE Decoder::Decode (input caching as WordID)!? - sparse vector instead of vector<double> for weights in Decoder(::SetWeights)? - reactivate DTEST and tests - non deterministic, high variance, RANDOM RESTARTS - use separate TEST SET - -KNOWN BUGS, PROBLEMS - doesn't select best iteration for weigts - if size of candidate < N => 0 score - cdec kbest vs 1best (no -k param), rescoring? => ok(?) - no sparse vector in decoder => ok - ? ok - sh: error while loading shared libraries: libreadline.so.6: cannot open shared object file: Error 24 - PhraseModel_* features (0..99 seem to be generated, why 99?) - flex scanner jams on malicious input, we could skip that - -FIX - merge - ep data |