diff options
author | Patrick Simianer <p@simianer.de> | 2011-10-14 15:42:05 +0200 |
---|---|---|
committer | Patrick Simianer <p@simianer.de> | 2011-10-14 15:42:05 +0200 |
commit | bc0d6dc3d0d58982add01077e7af1e8ec273666f (patch) | |
tree | d3a9e32a47a4373646902b4646bd6012b02c132c /dtrain | |
parent | 628c4ecb641096c6526c7e6062460e627433f8fa (diff) |
test
Diffstat (limited to 'dtrain')
-rw-r--r-- | dtrain/README.md | 54 |
1 files changed, 28 insertions, 26 deletions
diff --git a/dtrain/README.md b/dtrain/README.md index dc980faf..66168b6a 100644 --- a/dtrain/README.md +++ b/dtrain/README.md @@ -1,38 +1,40 @@ +dtrain +====== + IDEAS -===== - MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score) - what about RESCORING? - REMEMBER kbest (merge) weights? - SELECT iteration with highest (real) BLEU? - GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1) - CACHING (ngrams for scoring) - hadoop PIPES imlementation - SHARED LM (kenlm actually does this!)? - ITERATION variants - once -> average - shuffle resulting weights - weights AVERAGING in reducer (global Ngram counts) - BATCH implementation (no update after each Kbest list) - set REFERENCE for cdec (rescoring)? - MORE THAN ONE reference for BLEU? - kbest NICER (do not iterate twice)!? -> shared_ptr? - DO NOT USE Decoder::Decode (input caching as WordID)!? - sparse vector instead of vector<double> for weights in Decoder(::SetWeights)? - reactivate DTEST and tests - non deterministic, high variance, RANDOM RESTARTS - use separate TEST SET +----- +* MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score) +* what about RESCORING? +* REMEMBER kbest (merge) weights? +* SELECT iteration with highest (real) BLEU? +* GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1) +* CACHING (ngrams for scoring) +* hadoop PIPES imlementation +* SHARED LM (kenlm actually does this!)? +* ITERATION variants + * once -> average + * shuffle resulting weights +* weights AVERAGING in reducer (global Ngram counts) +* BATCH implementation (no update after each Kbest list) +* set REFERENCE for cdec (rescoring)? +* MORE THAN ONE reference for BLEU? +* kbest NICER (do not iterate twice)!? -> shared_ptr? +* DO NOT USE Decoder::Decode (input caching as WordID)!? +* sparse vector instead of vector<double> for weights in Decoder(::SetWeights)? +* reactivate DTEST and tests +* non deterministic, high variance, RANDOM RESTARTS +* use separate TEST SET Uncertain, known bugs, problems -=============================== +------------------------------- * cdec kbest vs 1best (no -k param), rescoring? => ok(?) * no sparse vector in decoder => ok/fixed * PhraseModel_* features (0..99 seem to be generated, why 99?) * flex scanner jams on malicious input, we could skip that FIXME -===== -* merge -* ep data +----- +* merge with cdec master Data ==== |