blob: 74bac6a03df4846b1005d7fe5a85c7a9f486c5f2 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
NOTES
learner gets all used features (binary! and dense (logprob is sum of logprobs!))
weights: see decoder/decoder.cc line 548
40k sents, k=100 = ~400M mem, 1 iteration 45min
utils/weights.cc: why wv_?
FD, Weights::wv_ grow too large, see utils/weights.cc;
decoder/hg.h; decoder/scfg_translator.cc; utils/fdict.cc
TODO
enable kbest FILTERING (nofiler vs unique)
MULTIPARTITE ranking (108010, 1 vs all, cluster modelscore;score)
what about RESCORING?
REMEMBER kbest (merge) weights?
SELECT iteration with highest (real) BLEU?
GENERATED data? (multi-task, ability to learn, perfect translation in nbest, at first all modelscore 1)
CACHING (ngrams for scoring)
hadoop PIPES imlementation
SHARED LM?
ITERATION variants
once -> average
shuffle resulting weights
weights AVERAGING in reducer (global Ngram counts)
BATCH implementation (no update after each Kbest list)
SOFIA --eta_type explicit
set REFERENCE for cdec (rescoring)?
MORE THAN ONE reference for BLEU?
kbest NICER (do not iterate twice)!? -> shared_ptr?
DO NOT USE Decoder::Decode (input caching as WordID)!?
sparse vector instead of vector<double> for weights in Decoder(::SetWeights)?
reactivate DTEST and tests
non deterministic, high variance, RANDOWM RESTARTS
use separate TEST SET
KNOWN BUGS PROBLEMS
does probably OVERFIT
cdec kbest vs 1best (no -k param) fishy!
sh: error while loading shared libraries: libreadline.so.6: cannot open shared object file: Error 24
PhraseModel_* features (0..99 seem to be generated, default?)
|