diff options
-rw-r--r-- | README.md | 83 |
1 files changed, 32 insertions, 51 deletions
@@ -1,55 +1,36 @@ +Not quite finished machine translation decoder. +(For Linux only) + TODO - * sparse vector (unordered_map) -> where to store? - * parser - * Rule -> ChartItem -> Node ? - * k-best - * other semirings - * include language model - * compress/hash words/feature strings? - * cast? Rule -> Edge, ChartItem -> Node - * feature factory, observer +==== + * proper parsing (Rico Sennrich's [1][2]?) + * k-best derivations [3] + * serialization for sparse vectors + * Rule-ChartItem-Node transition? + * cube pruning [4] and integrate kenlm [5] + * feature factory and observer patterns + * map all strings to ints? + * glue grammar [6] alright? + * read/writed gzipped files [11] + * integrate some BLAS lib for vector ops [12][13] Dependencies: - * MessagePack for object serialization [1] - * kenlm language model [2] - -This is Linux only. - - -[1] http://msgpack.org -[2] http://kheafield.com/code/kenlm/ - - -stuff to have a look at: -http://math.nist.gov/spblas/ -http://lapackpp.sourceforge.net/ -http://www.cvmlib.com/ -http://sourceforge.net/projects/lpp/ -http://math-atlas.sourceforge.net/ -http://www.netlib.org/lapack/ -http://bytes.com/topic/c/answers/702569-blas-vs-cblas-c -http://www.netlib.org/lapack/#_standard_c_language_apis_for_lapack -http://www.osl.iu.edu/research/mtl/download.php3 -http://scicomp.stackexchange.com/questions/351/recommendations-for-a-usable-fast-c-matrix-library -https://software.intel.com/en-us/tbb_4.2_doc -http://goog-perftools.sourceforge.net/doc/tcmalloc.html -http://www.sgi.com/tech/stl/Rope.html -http://www.cs.unc.edu/Research/compgeom/gzstream/ -https://github.com/facebook/folly/blob/6e46d468cf2876dd59c7a4dddcb4e37abf070b7a/folly/docs/Overview.md ---- -not much to see here, yet -(SCFG machine translation decoder in ruby, currently implements CKY+ parsing and hypergraph viterbi) - -helpful stuff - * https://github.com/jweese/thrax/wiki/Glue-grammar - * http://aclweb.org/aclwiki/index.php?title=Hypergraph_Format - * http://kheafield.com/code/kenlm/developers/ - -todo -==== - * integrate with HG (chart to json) - * kbest - * feature interface - * (global) word ids instead of strings - * animate parsing + * MessagePack for object serialization [8] + * Google's gperftools [9] + * json-cpp [10] + + +[1] http://aclweb.org/anthology/W/W14/W14-4011.pdf +[2] https://github.com/redpony/cdec/commit/448b451aa481b1509566ddb11abc3476466def6a +[3] http://www.cis.upenn.edu/~lhuang3/huang-iwpt-correct.pdf +[4] http://cui.unige.ch/~gesmundo/papers/gesmundo-iwslt10-fcp.pdf +[5] http://kheafield.com/code/kenlm/developers/2 +[6] https://github.com/jweese/thrax/wiki/Glue-grammar +[7] http://aclweb.org/aclwiki/index.php?title=Hypergraph_Format +[8] http://msgpack.org +[9] https://code.google.com/p/gperftools/ +[10] https://github.com/ascheglov/json-cpp +[11] http://www.cs.unc.edu/Research/compgeom/gzstream/ +[12] http://scicomp.stackexchange.com/questions/351/recommendations-for-a-usable-fast-c-matrix-library +[13] http://www.cvmlib.com/ |