notes

git-svn-id: https://ws10smt.googlecode.com/svn/trunk@469 ec762483-ff6d-05da-a07a-a48fb63a330f
author: graehl@gmail.com <graehl@gmail.com@ec762483-ff6d-05da-a07a-a48fb63a330f> 2010-08-02 16:43:59 +0000
committer: graehl@gmail.com <graehl@gmail.com@ec762483-ff6d-05da-a07a-a48fb63a330f> 2010-08-02 16:43:59 +0000
commit: 2fd80bbceadd625b74f8cbd989c945ce24a60fcc (patch)
tree: 9fa78b3cf6595bd6d95d9576a58c6a0f2e440a6a /graehl/NOTES.beam
parent: 506cdc7562956b8bd2460f7dd55a307775eb68cb (diff)
1 files changed, 20 insertions, 0 deletions
diff --git a/graehl/NOTES.beam b/graehl/NOTES.beam
new file mode 100755
index 00000000..59314439
--- /dev/null
+++ b/graehl/NOTES.beam
@@ -0,0 +1,20 @@
+(graehl, comments on code)
+
+passive chart: completion of actual translation rules (X or S NT in Hiero), have
+rule features.  Hyperedge inserted with copy of rule feature vector
+(non-sparse).  Inefficient; should be postponed on intermediate parses with
+global pruning; just keep pointer to rules and models must provide an interface
+to build a (sparse) feat. vector on demand later for the stuff we keep.
+
+multithreading: none.  list of hyperarcs for refinement would need to be
+segregated into subforest blocks and have own output lists for later merging.
+e.g. bottom up count number of tail-reachable nodes under each hypernode, then
+assign to workers.
+
+ngram caching: trie, no locks, for example.  for threading, LRU hashing w/ locks per bucket is probably better, or per-thread caches.  probably cache is reset per sentence?
+
+randlm worth using?  guess not.
+
+actually get all 0-state models in 1st pass parse and prune passive edges per span.
+
+allocate cube pruning budget per prev pass
author	graehl@gmail.com <graehl@gmail.com@ec762483-ff6d-05da-a07a-a48fb63a330f>	2010-08-02 16:43:59 +0000
committer	graehl@gmail.com <graehl@gmail.com@ec762483-ff6d-05da-a07a-a48fb63a330f>	2010-08-02 16:43:59 +0000
commit	2fd80bbceadd625b74f8cbd989c945ce24a60fcc (patch)
tree	9fa78b3cf6595bd6d95d9576a58c6a0f2e440a6a /graehl/NOTES.beam
parent	506cdc7562956b8bd2460f7dd55a307775eb68cb (diff)