index
:
cdec-dtrain-legacy
json_serial
master
net
Mirror of https://github.com/pks/cdec-dtrain-legacy.git
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
corpus
/
support
Age
Commit message (
Expand
)
Author
2014-06-03
fix for nonjoining chars
Chris Dyer
2014-03-18
chris edits
Chris Dyer
2014-03-12
XML file tokenization for all your WMT needs.
mjdenkowski
2014-03-10
few tokenization bugs
Chris Dyer
2014-02-27
ptb to normal
Chris Dyer
2014-02-20
Merge branch 'master' of https://github.com/redpony/cdec
armatthews
2014-02-20
slight beautification and more sane ordering
armatthews
2014-02-15
fix for missing angle quote form
Chris Dyer
2014-01-23
Reordered HTML entity blocks
armatthews
2014-01-23
Merged quote-norm with Greg's WMT normalization script
armatthews
2014-01-20
hindi months
Chris Dyer
2014-01-20
deal with acronyms in hindi
Chris Dyer
2014-01-20
hindi edits
Chris Dyer
2014-01-16
moar hindi
Chris Dyer
2014-01-15
deal with hindi
Chris Dyer
2013-09-05
Slower but correct (wrt buffered) unbuffered version.
Michael Denkowski
2013-09-05
Unbuffered mode, flush after each line where possible, skip otherwise
Michael Denkowski
2013-04-19
Merge branch 'master' of https://github.com/redpony/cdec
Chris Dyer
2013-04-19
hindi
Chris Dyer
2013-03-26
swahili abbreviations
Chris Dyer
2013-03-08
few preproc fixes
Chris Dyer
2013-02-23
one missing quote type
Chris Dyer
2013-01-22
russian abbrevs
Chris Dyer
2013-01-21
tokenizer support for utf8 patterns
Chris Dyer
2013-01-21
a little bit of cleanup
Chris Dyer
2012-12-05
slight tokenization bug fix
Chris Dyer
2012-12-05
remove logging, you should be using pv
Chris Dyer
2012-11-14
major mert clean up, stuff for simple system demo
Chris Dyer