summaryrefslogtreecommitdiff
path: root/corpus
AgeCommit message (Expand)Author
2014-01-28smarter script for adding <s> and </s> markersChris Dyer
2014-01-23Reordered HTML entity blocksarmatthews
2014-01-23Merged quote-norm with Greg's WMT normalization scriptarmatthews
2014-01-20hindi monthsChris Dyer
2014-01-20deal with acronyms in hindiChris Dyer
2014-01-20hindi editsChris Dyer
2014-01-16moar hindiChris Dyer
2014-01-15deal with hindiChris Dyer
2013-12-12Restore unbuffered functionality as optionmjdenkowski
2013-11-11error on new macsChris Dyer
2013-09-11Use bash instead of shmjdenkowski
2013-09-05Slower but correct (wrt buffered) unbuffered version.Michael Denkowski
2013-09-05Unbuffered mode, flush after each line where possible, skip otherwiseMichael Denkowski
2013-09-04DetokenizerMichael Denkowski
2013-04-19Merge branch 'master' of https://github.com/redpony/cdecChris Dyer
2013-04-19hindiChris Dyer
2013-03-26swahili abbreviationsChris Dyer
2013-03-17fix possible utf8 bugChris Dyer
2013-03-08Merge branch 'master' of https://github.com/redpony/cdecChris Dyer
2013-03-08few preproc fixesChris Dyer
2013-02-27quick fixChris Dyer
2013-02-23one missing quote typeChris Dyer
2013-01-22russian abbrevsChris Dyer
2013-01-21tokenizer support for utf8 patternsChris Dyer
2013-01-21a little bit of cleanupChris Dyer
2013-01-20control max lenChris Dyer
2013-01-19updated version of boost.m4 and automatically build kenneth's LM builderChris Dyer
2013-01-15corpus filesChris Dyer
2012-12-05slight tokenization bug fixChris Dyer
2012-12-05remove logging, you should be using pvChris Dyer
2012-12-04more flexible corpus cuttingChris Dyer
2012-11-16fixChris Dyer
2012-11-16readmeChris Dyer
2012-11-14major mert clean up, stuff for simple system demoChris Dyer
2012-11-06Merge branch 'master' of github.com:redpony/cdecChris Dyer
2012-11-06add lowercase scriptChris Dyer
2012-11-05script to add sos/eosChris Dyer
2012-10-25add self translationChris Dyer
2012-07-28script to paste files together with the triple pipe separatorChris Dyer
2012-07-28a couple of tools for cleaning corporaChris Dyer