summaryrefslogtreecommitdiff
path: root/graehl
diff options
context:
space:
mode:
Diffstat (limited to 'graehl')
-rwxr-xr-xgraehl/NOTES18
-rwxr-xr-xgraehl/NOTES.beam29
-rwxr-xr-xgraehl/NOTES.earley111
-rwxr-xr-xgraehl/NOTES.lm.phrase180
-rwxr-xr-xgraehl/NOTES.partial.binarize21
-rwxr-xr-xgraehl/NOTES.wfsa16
6 files changed, 0 insertions, 375 deletions
diff --git a/graehl/NOTES b/graehl/NOTES
deleted file mode 100755
index 77e99fee..00000000
--- a/graehl/NOTES
+++ /dev/null
@@ -1,18 +0,0 @@
-BUG: tune is bad - urdu conf=baseline tuning (16 dev bleu score???)
-
-conf=baseline force=1 ./tune.sh
-
- decode is good.
-
- UPDATE: maybe tuning is fine; chris never gave me a dev-corpus-filtered grammar and so a bleu of 16 may be what we always got; i just never checked. this means i need to redo tuned-first-pass experiments
-
-valgrind is ok
-
- dist-vest?
-
- (changes made to scoring? plusequals? shared_ptr? small_vector?)
-
- scorer_test is good
-
-
- line_optimer fast_score scorer
diff --git a/graehl/NOTES.beam b/graehl/NOTES.beam
deleted file mode 100755
index a48d1ab7..00000000
--- a/graehl/NOTES.beam
+++ /dev/null
@@ -1,29 +0,0 @@
-(graehl, comments on code)
-
-passive chart: completion of actual translation rules (X or S NT in Hiero), have
-rule features. Hyperedge inserted with copy of rule feature vector
-(non-sparse). Inefficient; should be postponed on intermediate parses with
-global pruning; just keep pointer to rules and models must provide an interface
-to build a (sparse) feat. vector on demand later for the stuff we keep.
-
-multithreading: none. list of hyperarcs for refinement would need to be
-segregated into subforest blocks and have own output lists for later merging.
-e.g. bottom up count number of tail-reachable nodes under each hypernode, then
-assign to workers.
-
-ngram caching: trie, no locks, for example. for threading, LRU hashing w/ locks per bucket is probably better, or per-thread caches. probably cache is reset per sentence?
-
-randlm worth using? guess not.
-
-actually get all 0-state models in 1st pass parse and prune passive edges per span.
-
-allocate cube pruning budget per prev pass
-
-(has been tried in ISI decoder) models with nonstandard state comparison,
-typically (partially) greedy forest scoring, some part of the state is excluded
-from equality/hashing. Within virtual ff interface, would just add equals, hash
-to vtable (rather than the faster raw state compare). If this is done often,
-then add a nonvirtual flag to interface instead, saying whether to use the
-virt. methods or not. or: simple flag by user of ApplyModels (per feature?)
-saying whether to be 100% greedy or 0% - no halfway e.g. item name uses bigram
-context, but score using 5gram state.
diff --git a/graehl/NOTES.earley b/graehl/NOTES.earley
deleted file mode 100755
index 0953708c..00000000
--- a/graehl/NOTES.earley
+++ /dev/null
@@ -1,111 +0,0 @@
-1. fsts (modify target string produced) are quite feasible. augment fsa ff to not just emit features, but also new target words. composition or intersection is no problem (can always bunch into a single FSA/FST lazily by wrapping)
-
-2. sparse fsas (many transitions have -inf score) aren't efficiently supported presently (they are in early_composer where the fsa is a phrase table); the fsa ff interface doesn't even provide a way to query the non-0 transitions (you have to emit a -inf feature). if sparse fsas were expected often and we wanted exact search, then a similar index of the tcfg as in earley_composer would make sense. however, with prob. beam search, we prune out bad-scoring stuff anyway
-
-3. binarization of rhs isn't usually considered necessary in earley, but i liked the idea of optimal binarization making the most sharing possible. however, this means what would have just been a scan is now a scan+complete.
-
-4. prefix (forward) + inside cost. this is phrased in a way so that prefix includes inside. but there's no reason not to think of it in exclusive terms (outside,inside) where prefix=outside*inside when using viterbi. on the other hand, we usually use the outside*inside as the beam score. and furthermore, it looks like when summing over all derivations, there would be some difficulty calculating, as the total inside wouldn't be known at first.
-
-(a,i) r => (+=a*r,r) would become (o,i) r => (+=[(o*i*r)/r],r) = (+=o*i,r)
-(_,b'') (a,b) => (+=a*b'',+=b*b'') would become (_,b'') (o,b) => (?????)
-
-====
-
-
-the target CFG (tcfg) is finite - absolutely no cycles. conceptually we're intersecting it with a wfsa (weights are feature vectors), which is a lot like parsing a lattice, in that states are (source state, dest state) pairs and you're covering some string(s) that go from source->dest in the wfsa.
-
-Chris' paper http://www.ling.umd.edu/~redpony/forest-reordering.pdf - apparently (figure 5) already contains the exact concept we're going for, albeit with only inside scores. http://www.speech.sri.com/cgi-bin/run-distill?ftp:papers/stolcke-cl95.ps.gz describes a nice way of computing sums over derivations given a string by keeping a tuple of ("forward","inner") scores while Earley parsing. I'm not sure yet if this is applicable (because we'll already have the exact outside scores from the -LM forest already, and plan on using cost pushing toward the top so we don't have to explicitly consider them).
-
-normally in earley, one word is consumed at a time, left to right. completions happen from shortest to longest, then (repeated) predictions, and finally scans. i'm sure this has the usual obvious extension to parsing lattices (proceed in some topological order).
-
-but because the wfsa (ngram lm) has cycles and forgets the length of the string (at some point), it's slightly more complicated than lattice parsing the tcfg - there's no topological order over the wfsa states and so you can't finish all the items [x,j] for j from left->right. best first with monotonic total scores (admissable heuristics) is an easy way to avoid generating the whole space; otherwise I can't think of a fixed order that would allow for alternative beaming. as we discussed, arbitrary predicates filtering candidate items can be added if exact best-first is too slow
-
-if the wfsa were just a single string t[0...n-1], then any time you have an item [i,j]X->a.b that means that there's some derivation in the tCFG of S =>* t[0...i-1]Xc => t[0....i-1]abc =>* t[0...j-1]bc , for a FSA, the analog is S =>* W(0,i)Xc => W(0,i)abc =>* W(0,i)W(i,j)bc where W(a,b) means any string in the wfsa language with a as the start state and b as the final state.
-
-
-http://www.isi.edu/natural-language/teaching/cs544/cs544-huang-3-Earley.pdf
-
-http://www.isi.edu/~lhuang/dp-final.pdf (variation on stolcke 1995 prefix cost)
-
-http://acl.ldc.upenn.edu/P/P07/P07-1019.pdf - phrase based lazy priority queue "cube growing" descendants (p149)
-
-
-
-
-
-http://www.speech.sri.com/cgi-bin/run-distill?ftp:papers/stolcke-cl95.ps.gz
-
-http://www.icsi.berkeley.edu/~stolcke/papers/cl95/node10.html#SECTION00042000000000000000
-
-a) An (unconstrained) Earley path, or simply path, is a sequence of Earley
-states linked by prediction, scanning, or completion. For the purpose of
-this definition, we allow scanning to operate in “generation mode,” i.e., all
-states with terminals to the right of the dot can be scanned, not just those
-matching the input. (For completed states, the predecessor state is defined
-to be the complete state from the same state set contributing to the
-completion.)
-b) A path is said to be constrained by, or generate a string x if the terminals
-immediately to the left of the dot in all scanned states, in sequence, form
-the string x.
-c) A path is complete if the last state on it matches the first, except that the
-dot has moved to the end of the RHS.
-d) We say that a path starts with nonterminal X if the first state on it is a
-predicted statewith X on the LHS.
-e) The length of a path is defined as the number of scanned states on it.
-
-Note that the definition of path length is somewhat counter-intuitive, but is motivated
-by the fact that only scanned states correspond directly to input symbols. Thus,
-the length of a path is always the same as the length of the input string it generates.
-
-A constrained path starting with the initial state contains a sequence of states from
-state set 0 derived by repeated prediction, followed by a single state from set 1 produced
-by scanning the first symbol, followed by a sequence of states produced by completion,
-followed by a sequence of predicted states, followed by a state scanning the second
-symbol, and so on. The significance of Earley paths is that they are in a one-to-one
-correspondence with left-most derivations.
-
-
-=============
-
-The forward probability alpha_i(X[k]->x.y) is the sum of the probabilities of all
-constrained paths of length that end in state X[k]->x.y
-
-b) The inner probability beta_i(X[k]->x.y) is the sum of the probabilities of all
-paths of length i-k that start in state X[k,k]->.xy and end in X[k,i]->x.y, and generate the input symbols x[k,...,i-1]
-
-(forward,inner) [i.e. (outside,inside)?]
- unchanged by scan (rule cost is paid up front when predicting)
-
-if X[k,i] -> x.Yz (a,b) and rule Y -> r (p)
-then Y[i,i] -> .r (a',p) with a' += a*p
-
-if Y[j,i]->y. (a'',b'') and X[k,j]->r.Ys (a,b)
-then X[k,i]->rY.s (a',b') with a' += a*b'', b' += b*b''
-
-(this is summing over all derivations)
-
-
-==========
-
-is forward cost viterbi fine? i.e. can i have items whose names ignore the lhs NT (look up predictions that i finish lazily / graph structured?)
-======
-
-1) A -> x . * (trie)
-
-this is somewhat nice. cost pushed for best first, of course. similar benefit as left-branching binarization without the explicit predict/complete steps?
-
-vs. just
-
-2) * -> x . y
-
-here you have to potentially list out all A -> . x y as items * -> . x y immediately, and shared rhs seqs won't be shared except at the usual single-NT predict/complete. of course, the prediction of items -> . x y can occur lazy best-first.
-
-vs.
-
-3) * -> x . *
-
-with 3, we predict all sorts of useless items - that won't give us our goal A and may not partcipate in any parse. this is not a good option at all.
-
-======
-
--LM forest may have many in-edges per V. (many rules per NT lhs). so instead of generating all successors for scan/predict, i wanted to have them in sorted (admissable) -LM cost order and postpone once the prefix+rule part is more expensive than something else in the agenda. question: how many such postponed successor things
diff --git a/graehl/NOTES.lm.phrase b/graehl/NOTES.lm.phrase
deleted file mode 100755
index e87cc6fb..00000000
--- a/graehl/NOTES.lm.phrase
+++ /dev/null
@@ -1,180 +0,0 @@
-possibly the most direct solution is to print every individual probability from LM (to global fstream?). since the difference happens even w/o shortening, disable shortening to remove the possible effect of floor(p+bo) vs floor(p)+bo disagreeing
-
-+LM forest (nodes/edges): 2163/11293
- +LM forest (paths): 7.14685e+14
- +LM forest Viterbi logp: -490.21
- +LM forest Viterbi: karachi ( AstRAf rpwrtRr ) in karachi on monday in different HAdvAt gyY and killed 4 people including a woman found dead body of a person from the sea .
- +LM forest derivation: ({<0,28>[1] ||| (final r->l(( karachi|<s> ) start=[ <s>]->{karachi (} r->l(</s>|. sea) end=[sea .]->{</s>} LanguageModelFsa=-5.74766; }({<0,28>[1] [2] ||| [karachi ( : [a woman]] r->l(( karachi|) [found dead : [sea .]] r->l(dead found|woman a) = [karachi ( : [sea .]] LanguageModelFsa=-5.93027 h=-5.83552); }({<0,20>[1] [2] ||| [karachi ( : []] r->l(( karachi|) [gyY and : [a woman]] r->l(and gyY|) = [karachi ( : [a woman]] LanguageModelFsa=-101.72 h=-5.83552); }({<0,12>[1] [2] ||| [karachi ( : [in karachi]] r->l(( karachi|) [on monday : []] r->l(monday on|karachi in) = [karachi ( : []] LanguageModelFsa=-1.99946 h=-5.83552); }({<0,7>[1] [2] ||| [karachi ( : [rpwrtRr )]] r->l(( karachi|) [in karachi : [in karachi]] r->l(karachi in|) rpwrtRr) = [karachi ( : [in karachi]] LanguageModelFsa=-3.40247 h=-5.83552); }({<0,5>[1] [2] ||| [karachi ( : []] r->l(( karachi|) [rpwrtRr ) : [rpwrtRr )]] r->l() rpwrtRr|) = [karachi ( : [rpwrtRr )]] LanguageModelFsa=-102.623 h=-5.83552); }({<0,3>[1] ||| [karachi ( : []] r->l(( karachi|) = [karachi ( : []] LanguageModelFsa=0 h=-5.83552); }({<0,3>karachi [1] ||| [( AstRAf : []] r->l(( karachi|) r->l(AstRAf|( karachi) = [karachi ( : []] LanguageModelFsa=-100 h=-5.83552); r->l(karachi|)}({<1,3>( [1] ||| [AstRAf] r->l(AstRAf (|) = [( AstRAf : []] LanguageModelFsa=0 h=-102.641); r->l((|)}({<2,3>AstRAf ||| r->l(AstRAf|) = [AstRAf] LanguageModelFsa=0 h=-100); r->l(AstRAf|)}) ) ) ) ({<3,5>[1] ) ||| [rpwrtRr] r->l() rpwrtRr|) = [rpwrtRr ) : [rpwrtRr )]] LanguageModelFsa=0 h=-102.623); r->l()|)}({<3,4>rpwrtRr ||| r->l(rpwrtRr|) = [rpwrtRr] LanguageModelFsa=0 h=-100); r->l(rpwrtRr|)}) ) ) ({<5,7>in karachi ||| r->l(karachi in|) = [in karachi : [in karachi]] LanguageModelFsa=0 h=-3.80404); r->l(karachi in|)}) ) ({<7,12>on monday in [1] ||| r->l(monday on|) rule-phrase[in] r->l(in|monday on) [different HAdvAt : []] r->l(HAdvAt different|in monday) = [on monday : []] LanguageModelFsa=-103.918 h=-3.91305); r->l(in monday on|)}({<9,11>different [1] ||| [HAdvAt] r->l(HAdvAt different|) = [different HAdvAt : []] LanguageModelFsa=0 h=-103.573); r->l(different|)}({<10,11>HAdvAt ||| r->l(HAdvAt|) = [HAdvAt] LanguageModelFsa=0 h=-100); r->l(HAdvAt|)}) ) ) ) ({<12,20>[2] killed [1] ||| [gyY and : [gyY and]] r->l(and gyY|) rule-phrase[killed] r->l(killed|and gyY) [4 people : [a woman]] r->l(people 4|killed and) = [gyY and : [a woman]] LanguageModelFsa=-5.57026 h=-101.72); r->l(killed|)}({<12,16>[2] people including a [1] ||| [4] r->l(people 4|) rule-phrase[including a] r->l(a including|people 4) [woman] r->l(woman|a including) = [4 people : [a woman]] LanguageModelFsa=-3.99305 h=-6.22734); r->l(a including people|)}({<12,13>woman ||| r->l(woman|) = [woman] LanguageModelFsa=0 h=-3.82934); r->l(woman|)}) ({<14,15>4 ||| r->l(4|) = [4] LanguageModelFsa=0 h=-3.62974); r->l(4|)}) ) ({<18,20>[1] and ||| [gyY] r->l(and gyY|) = [gyY and : [gyY and]] LanguageModelFsa=0 h=-101.72); r->l(and|)}({<18,19>gyY ||| r->l(gyY|) = [gyY] LanguageModelFsa=0 h=-100); r->l(gyY|)}) ) ) ) ({<20,28>[1] the sea . ||| [found dead : [ from]] r->l(dead found|) rule-phrase[the sea .] r->l(. sea the|from ) = [found dead : [sea .]] LanguageModelFsa=-4.84745 h=-7.62839); r->l(. sea the|)}({<21,27>found [1] from ||| [dead body : []] r->l(dead found|) r->l(body|dead found) rule-phrase[from] r->l(from|) = [found dead : [ from]] LanguageModelFsa=-3.42491 h=-7.62839); r->l(found|) r->l(from|)}({<22,26>dead body of [1] ||| r->l(body dead|) rule-phrase[of] r->l(of|body dead) [a person : []] r->l(person a|of body) = [dead body : []] LanguageModelFsa=-2.9934 h=-4.63222); r->l(of body dead|)}({<22,24>a [1] ||| [person] r->l(person a|) = [a person : []] LanguageModelFsa=0 h=-4.90016); r->l(a|)}({<23,24>person ||| r->l(person|) = [person] LanguageModelFsa=0 h=-3.50165); r->l(person|)}) ) ) ) ) ) )
- +LM forest features: Arity_0=-3.47436;Arity_1=-4.77724;Arity_2=-3.04006;Glue=5;LanguageModel=-446.49;LmFsa=-446.17;PassThrough=5;PhraseModel_0=12.2199;PhraseModel_1=11.6391;PhraseModel_2=10.9878;WordPenalty=-13.0288;Unigram=-462.696;UnigramFsa=-462.696
-Output kbest to -
-0 ||| karachi ( AstRAf rpwrtRr ) in karachi on monday in different HAdvAt gyY and killed 4 people including a woman found dead body of a person from the sea . ||| Arity_0=-3.47436;Arity_1=-4.77724;Arity_2=-3.04006;Glue=5;LanguageModel=-446.49;LmFsa=-446.17;PassThrough=5;PhraseModel_0=12.2199;PhraseModel_1=11.6391;PhraseModel_2=10.9878;WordPenalty=-13.0288;Unigram=-462.696;UnigramFsa=-462.696 ||| -490.21
-
-sent_id=0
-({<0,28>[1] ||| (final r->l(( karachi|<s> ) start=[ <s>]->{karachi (} r->l(</s>|. sea) end=[sea .]->{</s>} LanguageModelFsa=-5.74766; }
- ({<0,28>[1] [2] ||| [karachi ( : [a woman]] r->l(( karachi|) [found dead : [sea .]] r->l(dead found|woman a) = [karachi ( : [sea .]] LanguageModelFsa=-5.93027 h=-5.83552); }
- ({<0,20>[1] [2] ||| [karachi ( : []] r->l(( karachi|) [gyY and : [a woman]] r->l(and gyY|) = [karachi ( : [a woman]] LanguageModelFsa=-101.72 h=-5.83552); }
- ({<0,12>[1] [2] ||| [karachi ( : [in karachi]] r->l(( karachi|) [on monday : []] r->l(monday on|karachi in) = [karachi ( : []] LanguageModelFsa=-1.99946 h=-5.83552); }
- ({<0,7>[1] [2] ||| [karachi ( : [rpwrtRr )]] r->l(( karachi|) [in karachi : [in karachi]] r->l(karachi in|) rpwrtRr) = [karachi ( : [in karachi]] LanguageModelFsa=-3.40247 h=-5.83552); }
- ({<0,5>[1] [2] ||| [karachi ( : []] r->l(( karachi|) [rpwrtRr ) : [rpwrtRr )]] r->l() rpwrtRr|) = [karachi ( : [rpwrtRr )]] LanguageModelFsa=-102.623 h=-5.83552); }
- ({<0,3>[1] ||| [karachi ( : []] r->l(( karachi|) = [karachi ( : []] LanguageModelFsa=0 h=-5.83552); }
- ({<0,3>karachi [1] ||| [( AstRAf : []] r->l(( karachi|) r->l(AstRAf|( karachi) = [karachi ( : []] LanguageModelFsa=-100 h=-5.83552); r->l(karachi|)}
- ({<1,3>( [1] ||| [AstRAf] r->l(AstRAf (|) = [( AstRAf : []] LanguageModelFsa=0 h=-102.641); r->l((|)}
- ({<2,3>AstRAf ||| r->l(AstRAf|) = [AstRAf] LanguageModelFsa=0 h=-100); r->l(AstRAf|)}
- )
- )
- )
- )
- ({<3,5>[1] ) ||| [rpwrtRr] r->l() rpwrtRr|) = [rpwrtRr ) : [rpwrtRr )]] LanguageModelFsa=0 h=-102.623); r->l()|)}
- ({<3,4>rpwrtRr ||| r->l(rpwrtRr|) = [rpwrtRr] LanguageModelFsa=0 h=-100); r->l(rpwrtRr|)}
- )
- )
- )
- ({<5,7>in karachi ||| r->l(karachi in|) = [in karachi : [in karachi]] LanguageModelFsa=0 h=-3.80404); r->l(karachi in|)}
- )
- )
- ({<7,12>on monday in [1] ||| r->l(monday on|) rule-phrase[in] r->l(in|monday on) [different HAdvAt : []] r->l(HAdvAt different|in monday) = [on monday : []] LanguageModelFsa=-103.918 h=-3.91305); r->l(in monday on|)}
- ({<9,11>different [1] ||| [HAdvAt] r->l(HAdvAt different|) = [different HAdvAt : []] LanguageModelFsa=0 h=-103.573); r->l(different|)}
- ({<10,11>HAdvAt ||| r->l(HAdvAt|) = [HAdvAt] LanguageModelFsa=0 h=-100); r->l(HAdvAt|)}
- )
- )
- )
- )
- ({<12,20>[2] killed [1] ||| [gyY and : [gyY and]] r->l(and gyY|) rule-phrase[killed] r->l(killed|and gyY) [4 people : [a woman]] r->l(people 4|killed and) = [gyY and : [a woman]] LanguageModelFsa=-5.57026 h=-101.72); r->l(killed|)}
- ({<12,16>[2] people including a [1] ||| [4] r->l(people 4|) rule-phrase[including a] r->l(a including|people 4) [woman] r->l(woman|a including) = [4 people : [a woman]] LanguageModelFsa=-3.99305 h=-6.22734); r->l(a including people|)}
- ({<12,13>woman ||| r->l(woman|) = [woman] LanguageModelFsa=0 h=-3.82934); r->l(woman|)}
- )
- ({<14,15>4 ||| r->l(4|) = [4] LanguageModelFsa=0 h=-3.62974); r->l(4|)}
- )
- )
- ({<18,20>[1] and ||| [gyY] r->l(and gyY|) = [gyY and : [gyY and]] LanguageModelFsa=0 h=-101.72); r->l(and|)}
- ({<18,19>gyY ||| r->l(gyY|) = [gyY] LanguageModelFsa=0 h=-100); r->l(gyY|)}
- )
- )
- )
- )
- ({<20,28>[1] the sea . ||| [found dead : [ from]] r->l(dead found|) rule-phrase[the sea .] r->l(. sea the|from ) = [found dead : [sea .]] LanguageModelFsa=-4.84745 h=-7.62839); r->l(. sea the|)}
- ({<21,27>found [1] from ||| [dead body : []] r->l(dead found|) r->l(body|dead found) rule-phrase[from] r->l(from|) = [found dead : [ from]] LanguageModelFsa=-3.42491 h=-7.62839); r->l(found|) r->l(from|)}
- ({<22,26>dead body of [1] ||| r->l(body dead|) rule-phrase[of] r->l(of|body dead) [a person : []] r->l(person a|of body) = [dead body : []] LanguageModelFsa=-2.9934 h=-4.63222); r->l(of body dead|)}
- ({<22,24>a [1] ||| [person] r->l(person a|) = [a person : []] LanguageModelFsa=0 h=-4.90016); r->l(a|)}
- ({<23,24>person ||| r->l(person|) = [person] LanguageModelFsa=0 h=-3.50165); r->l(person|)}
- )
- )
- )
- )
- )
- )
-)
-0 ||| karachi ( AstRAf rpwrtRr ) in karachi on monday in different HAdvAt gyY and killed 4 people including a woman found the dead body of a person from the sea . ||| Arity_0=-3.47436;Arity_1=-4.77724;Arity_2=-3.04006;Glue=5;LanguageModel=-446.828;LmFsa=-446.508;PassThrough=5;PhraseModel_0=12.697;PhraseModel_1=11.6391;PhraseModel_2=11.5728;WordPenalty=-13.4631;Unigram=-463.765;UnigramFsa=-463.765 ||| -490.295
-
-sent_id=0
-({<0,28>[1] ||| (final r->l(( karachi|<s> ) start=[ <s>]->{karachi (} r->l(</s>|. sea) end=[sea .]->{</s>} LanguageModelFsa=-5.74766; }
- ({<0,28>[1] [2] ||| [karachi ( : [a woman]] r->l(( karachi|) [found the : [sea .]] r->l(the found|woman a) = [karachi ( : [sea .]] LanguageModelFsa=-3.6217 h=-5.83552); }
- ({<0,20>[1] [2] ||| [karachi ( : []] r->l(( karachi|) [gyY and : [a woman]] r->l(and gyY|) = [karachi ( : [a woman]] LanguageModelFsa=-101.72 h=-5.83552); }
- ({<0,12>[1] [2] ||| [karachi ( : [in karachi]] r->l(( karachi|) [on monday : []] r->l(monday on|karachi in) = [karachi ( : []] LanguageModelFsa=-1.99946 h=-5.83552); }
- ({<0,7>[1] [2] ||| [karachi ( : [rpwrtRr )]] r->l(( karachi|) [in karachi : [in karachi]] r->l(karachi in|) rpwrtRr) = [karachi ( : [in karachi]] LanguageModelFsa=-3.40247 h=-5.83552); }
- ({<0,5>[1] [2] ||| [karachi ( : []] r->l(( karachi|) [rpwrtRr ) : [rpwrtRr )]] r->l() rpwrtRr|) = [karachi ( : [rpwrtRr )]] LanguageModelFsa=-102.623 h=-5.83552); }
- ({<0,3>[1] ||| [karachi ( : []] r->l(( karachi|) = [karachi ( : []] LanguageModelFsa=0 h=-5.83552); }
- ({<0,3>karachi [1] ||| [( AstRAf : []] r->l(( karachi|) r->l(AstRAf|( karachi) = [karachi ( : []] LanguageModelFsa=-100 h=-5.83552); r->l(karachi|)}
- ({<1,3>( [1] ||| [AstRAf] r->l(AstRAf (|) = [( AstRAf : []] LanguageModelFsa=0 h=-102.641); r->l((|)}
- ({<2,3>AstRAf ||| r->l(AstRAf|) = [AstRAf] LanguageModelFsa=0 h=-100); r->l(AstRAf|)}
- )
- )
- )
- )
- ({<3,5>[1] ) ||| [rpwrtRr] r->l() rpwrtRr|) = [rpwrtRr ) : [rpwrtRr )]] LanguageModelFsa=0 h=-102.623); r->l()|)}
- ({<3,4>rpwrtRr ||| r->l(rpwrtRr|) = [rpwrtRr] LanguageModelFsa=0 h=-100); r->l(rpwrtRr|)}
- )
- )
- )
- ({<5,7>in karachi ||| r->l(karachi in|) = [in karachi : [in karachi]] LanguageModelFsa=0 h=-3.80404); r->l(karachi in|)}
- )
- )
- ({<7,12>on monday in [1] ||| r->l(monday on|) rule-phrase[in] r->l(in|monday on) [different HAdvAt : []] r->l(HAdvAt different|in monday) = [on monday : []] LanguageModelFsa=-103.918 h=-3.91305); r->l(in monday on|)}
- ({<9,11>different [1] ||| [HAdvAt] r->l(HAdvAt different|) = [different HAdvAt : []] LanguageModelFsa=0 h=-103.573); r->l(different|)}
- ({<10,11>HAdvAt ||| r->l(HAdvAt|) = [HAdvAt] LanguageModelFsa=0 h=-100); r->l(HAdvAt|)}
- )
- )
- )
- )
- ({<12,20>[2] killed [1] ||| [gyY and : [gyY and]] r->l(and gyY|) rule-phrase[killed] r->l(killed|and gyY) [4 people : [a woman]] r->l(people 4|killed and) = [gyY and : [a woman]] LanguageModelFsa=-5.57026 h=-101.72); r->l(killed|)}
- ({<12,16>[2] people including a [1] ||| [4] r->l(people 4|) rule-phrase[including a] r->l(a including|people 4) [woman] r->l(woman|a including) = [4 people : [a woman]] LanguageModelFsa=-3.99305 h=-6.22734); r->l(a including people|)}
- ({<12,13>woman ||| r->l(woman|) = [woman] LanguageModelFsa=0 h=-3.82934); r->l(woman|)}
- )
- ({<14,15>4 ||| r->l(4|) = [4] LanguageModelFsa=0 h=-3.62974); r->l(4|)}
- )
- )
- ({<18,20>[1] and ||| [gyY] r->l(and gyY|) = [gyY and : [gyY and]] LanguageModelFsa=0 h=-101.72); r->l(and|)}
- ({<18,19>gyY ||| r->l(gyY|) = [gyY] LanguageModelFsa=0 h=-100); r->l(gyY|)}
- )
- )
- )
- )
- ({<20,28>[1] the sea . ||| [found the : [ from]] r->l(the found|) rule-phrase[the sea .] r->l(. sea the|from ) = [found the : [sea .]] LanguageModelFsa=-4.84745 h=-5.31983); r->l(. sea the|)}
- ({<21,27>found [1] from ||| [the dead : []] r->l(the found|) r->l(dead|the found) rule-phrase[from] r->l(from|) = [found the : [ from]] LanguageModelFsa=-5.34421 h=-5.31983); r->l(found|) r->l(from|)}
- ({<22,26>the dead body of [1] ||| r->l(dead the|) rule-phrase[body of] r->l(of body|dead the) [a person : []] r->l(person a|of body) = [the dead : []] LanguageModelFsa=-3.7205 h=-4.97373); r->l(of body dead the|)}
- ({<22,24>a [1] ||| [person] r->l(person a|) = [a person : []] LanguageModelFsa=0 h=-4.90016); r->l(a|)}
- ({<23,24>person ||| r->l(person|) = [person] LanguageModelFsa=0 h=-3.50165); r->l(person|)}
- )
- )
- )
- )
- )
- )
-)
-0 ||| karachi ( AstRAf rpwrtRr ) in karachi on monday in different HAdvAt gyY killed 4 people including a woman while dead body of a person from the sea . ||| Arity_0=-3.47436;Arity_1=-4.77724;Arity_2=-3.04006;Glue=5;LanguageModel=-445.419;LmFsa=-445.099;PassThrough=5;PhraseModel_0=12.5687;PhraseModel_1=12.5781;PhraseModel_2=9.61571;WordPenalty=-12.5945;Unigram=-461.303;UnigramFsa=-461.303 ||| -490.646
-
-sent_id=0
-({<0,28>[1] ||| (final r->l(( karachi|<s> ) start=[ <s>]->{karachi (} r->l(</s>|. sea) end=[sea .]->{</s>} LanguageModelFsa=-5.74766; }
- ({<0,28>[1] [2] ||| [karachi ( : [a woman]] r->l(( karachi|) [while dead : [sea .]] r->l(dead while|woman a) = [karachi ( : [sea .]] LanguageModelFsa=-5.71074 h=-5.83552); }
- ({<0,19>[1] [2] ||| [karachi ( : []] r->l(( karachi|) [gyY killed : [a woman]] r->l(killed gyY|) = [karachi ( : [a woman]] LanguageModelFsa=-103.345 h=-5.83552); }
- ({<0,12>[1] [2] ||| [karachi ( : [in karachi]] r->l(( karachi|) [on monday : []] r->l(monday on|karachi in) = [karachi ( : []] LanguageModelFsa=-1.99946 h=-5.83552); }
- ({<0,7>[1] [2] ||| [karachi ( : [rpwrtRr )]] r->l(( karachi|) [in karachi : [in karachi]] r->l(karachi in|) rpwrtRr) = [karachi ( : [in karachi]] LanguageModelFsa=-3.40247 h=-5.83552); }
- ({<0,5>[1] [2] ||| [karachi ( : []] r->l(( karachi|) [rpwrtRr ) : [rpwrtRr )]] r->l() rpwrtRr|) = [karachi ( : [rpwrtRr )]] LanguageModelFsa=-102.623 h=-5.83552); }
- ({<0,3>[1] ||| [karachi ( : []] r->l(( karachi|) = [karachi ( : []] LanguageModelFsa=0 h=-5.83552); }
- ({<0,3>karachi [1] ||| [( AstRAf : []] r->l(( karachi|) r->l(AstRAf|( karachi) = [karachi ( : []] LanguageModelFsa=-100 h=-5.83552); r->l(karachi|)}
- ({<1,3>( [1] ||| [AstRAf] r->l(AstRAf (|) = [( AstRAf : []] LanguageModelFsa=0 h=-102.641); r->l((|)}
- ({<2,3>AstRAf ||| r->l(AstRAf|) = [AstRAf] LanguageModelFsa=0 h=-100); r->l(AstRAf|)}
- )
- )
- )
- )
- ({<3,5>[1] ) ||| [rpwrtRr] r->l() rpwrtRr|) = [rpwrtRr ) : [rpwrtRr )]] LanguageModelFsa=0 h=-102.623); r->l()|)}
- ({<3,4>rpwrtRr ||| r->l(rpwrtRr|) = [rpwrtRr] LanguageModelFsa=0 h=-100); r->l(rpwrtRr|)}
- )
- )
- )
- ({<5,7>in karachi ||| r->l(karachi in|) = [in karachi : [in karachi]] LanguageModelFsa=0 h=-3.80404); r->l(karachi in|)}
- )
- )
- ({<7,12>on monday in [1] ||| r->l(monday on|) rule-phrase[in] r->l(in|monday on) [different HAdvAt : []] r->l(HAdvAt different|in monday) = [on monday : []] LanguageModelFsa=-103.918 h=-3.91305); r->l(in monday on|)}
- ({<9,11>different [1] ||| [HAdvAt] r->l(HAdvAt different|) = [different HAdvAt : []] LanguageModelFsa=0 h=-103.573); r->l(different|)}
- ({<10,11>HAdvAt ||| r->l(HAdvAt|) = [HAdvAt] LanguageModelFsa=0 h=-100); r->l(HAdvAt|)}
- )
- )
- )
- )
- ({<12,19>[2] killed [1] ||| [gyY] r->l(killed gyY|) [4 people : [a woman]] r->l(people 4|killed gyY) = [gyY killed : [a woman]] LanguageModelFsa=-2.98475 h=-103.345); r->l(killed|)}
- ({<12,16>[2] people including a [1] ||| [4] r->l(people 4|) rule-phrase[including a] r->l(a including|people 4) [woman] r->l(woman|a including) = [4 people : [a woman]] LanguageModelFsa=-3.99305 h=-6.22734); r->l(a including people|)}
- ({<12,13>woman ||| r->l(woman|) = [woman] LanguageModelFsa=0 h=-3.82934); r->l(woman|)}
- )
- ({<14,15>4 ||| r->l(4|) = [4] LanguageModelFsa=0 h=-3.62974); r->l(4|)}
- )
- )
- ({<18,19>gyY ||| r->l(gyY|) = [gyY] LanguageModelFsa=0 h=-100); r->l(gyY|)}
- )
- )
- )
- ({<19,28>while [1] ||| [dead body : [sea .]] r->l(dead while|) r->l(body|dead while) = [while dead : [sea .]] LanguageModelFsa=-1.20144 h=-6.25144); r->l(while|)}
- ({<20,28>[1] . ||| [dead body : [the sea]] r->l(body dead|) rule-phrase[.] r->l(.|sea the) = [dead body : [sea .]] LanguageModelFsa=-0.45297 h=-4.63222); r->l(.|)}
- ({<20,26>[1] the sea ||| [dead body : [ from]] r->l(body dead|) rule-phrase[the sea] r->l(sea the|from ) = [dead body : [the sea]] LanguageModelFsa=-4.39448 h=-4.63222); r->l(sea the|)}
- ({<21,26>dead body of [1] ||| r->l(body dead|) rule-phrase[of] r->l(of|body dead) [a person : [ from]] r->l(person a|of body) = [dead body : [ from]] LanguageModelFsa=-2.9934 h=-4.63222); r->l(of body dead|)}
- ({<21,24>a [1] from ||| [person] r->l(person a|) rule-phrase[from] r->l(from|) = [a person : [ from]] LanguageModelFsa=-2.33299 h=-4.90016); r->l(a|) r->l(from|)}
- ({<23,24>person ||| r->l(person|) = [person] LanguageModelFsa=0 h=-3.50165); r->l(person|)}
- )
- )
- )
- )
- )
- )
- )
-)
diff --git a/graehl/NOTES.partial.binarize b/graehl/NOTES.partial.binarize
deleted file mode 100755
index a9985891..00000000
--- a/graehl/NOTES.partial.binarize
+++ /dev/null
@@ -1,21 +0,0 @@
-Earley doesn't require binarized rules.
-
-But a (partially) binarized grammar may lead to smaller (exhaustive or heuristic) charts. The tradeoff is mostly more reduce steps (the # of NTs should be similar or less than the usual dotted-item binarization0.
-
-Optionally collapse a rule rhs to unary as well (normal binarization would stop when an rhs is binary), if the rule to collapse it exists or is frequent enough.
-
-Greedy binarization schemes:
-
-1) (repeatedly) for the most frequent rhs bigram "X a" create a binary rule "V -> X a" and replace "X a" in all rules' rhs with V. stop if the most frequent bigram has count lower than some threshold (e.g. 3), because each instance of it saves one symbol, but the new rule has 3 symbols.
-
-2) (repeatedly) for each rule, pick the most frequent bigram in its rhs and binarize it (2a for that rule only, 2b everywhere that bigram occurs). again, some frequency threshold. optionally allow collapsing an rhs to unary. this fails to use some substitutions that are available "for free" based on actions taken at earlier rules w/ no frequent bigrams in common with this one.
-
-3) (DeNero) (for complete binarization only?) for each rule until binarized, pick a split point k of L->r[0..n) to make rules L->V1 V2, V1->r[0..k) V2->r[k..n), to minimize the number of new rules created. If no prefix or suffix of r already exists as a virtual rule, then choose k=floor(n/2). To amend this to consider frequency of rhs, use the frequency of rhs-prefix/suffixes to decide where to split?
-
-4?) Song, Chin-Yew Lin - seems to require collecting stats from a larged parsed corpus - interesting idea: make rules that don't match fail early (that's 1 way you get a speedup), and pick V1 -> ... based on some kind of expected utility.
-
-5) l2r, r2l. yawn.
-
-1) seems the most sensible. don't just keep a count for each bigram, keep a set of left and right adjacent partially overlapping bigrams (i.e. the words left and right). for "a b" if "c" and "d" occur to the right, then "b c" and "b d" would be the right adjacent bigrams. when replacing a bigram, follow the left and right adjacencies to decrement the count of those bigrams, and add a (bidirectional) link to the new bigram.
-
-Further, partial-1) can be followed by complete-3) or 5) - although i see no reason not to just continue 1) until the grammar is binary if you want a full binarization.
diff --git a/graehl/NOTES.wfsa b/graehl/NOTES.wfsa
deleted file mode 100755
index b74dc810..00000000
--- a/graehl/NOTES.wfsa
+++ /dev/null
@@ -1,16 +0,0 @@
-left-to-right finite-state models (with heuristic) that depend only on the target string.
-
-http://github.com/jganitkevitch/cdec.git has some progress toward this:
-
-earley_generator.*: make a trie of earley dotted items (from first pass finite parse projected to target side?) and rules for each earley deduction step (is the predict step actually making a hyperedge? or is it marked "active" and so doesn't appear in the result?)
-
-ff_ltor.*: interface for l2r models; needless scoring of "complete" action (only heuristic changes there and heuristics can just be precomputed for all dot-items
-ff_lm.*: ugly clone of regular LM model with l2r interface
-
-apply_models.*: ApplyLeftToRightModelSet
-
-l2r features:
-
-multiple feature ids from single model?
-
-declare markov bound for bottom-up scoring (inside items) wrapper, and "backoff start" state (i.e. empty context, not <s> context)