diff options
author | graehl@gmail.com <graehl@gmail.com@ec762483-ff6d-05da-a07a-a48fb63a330f> | 2010-08-02 16:43:59 +0000 |
---|---|---|
committer | graehl@gmail.com <graehl@gmail.com@ec762483-ff6d-05da-a07a-a48fb63a330f> | 2010-08-02 16:43:59 +0000 |
commit | 2fd80bbceadd625b74f8cbd989c945ce24a60fcc (patch) | |
tree | 9fa78b3cf6595bd6d95d9576a58c6a0f2e440a6a /graehl/NOTES.beam | |
parent | 506cdc7562956b8bd2460f7dd55a307775eb68cb (diff) |
notes
git-svn-id: https://ws10smt.googlecode.com/svn/trunk@469 ec762483-ff6d-05da-a07a-a48fb63a330f
Diffstat (limited to 'graehl/NOTES.beam')
-rwxr-xr-x | graehl/NOTES.beam | 20 |
1 files changed, 20 insertions, 0 deletions
diff --git a/graehl/NOTES.beam b/graehl/NOTES.beam new file mode 100755 index 00000000..59314439 --- /dev/null +++ b/graehl/NOTES.beam @@ -0,0 +1,20 @@ +(graehl, comments on code) + +passive chart: completion of actual translation rules (X or S NT in Hiero), have +rule features. Hyperedge inserted with copy of rule feature vector +(non-sparse). Inefficient; should be postponed on intermediate parses with +global pruning; just keep pointer to rules and models must provide an interface +to build a (sparse) feat. vector on demand later for the stuff we keep. + +multithreading: none. list of hyperarcs for refinement would need to be +segregated into subforest blocks and have own output lists for later merging. +e.g. bottom up count number of tail-reachable nodes under each hypernode, then +assign to workers. + +ngram caching: trie, no locks, for example. for threading, LRU hashing w/ locks per bucket is probably better, or per-thread caches. probably cache is reset per sentence? + +randlm worth using? guess not. + +actually get all 0-state models in 1st pass parse and prune passive edges per span. + +allocate cube pruning budget per prev pass |