index
:
cdec-dtrain
cmake
master
sa_mmap
word-alignment
Mirror of https://github.com/pks/cdec-dtrain.git
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
corpus
Age
Commit message (
Collapse
)
Author
2014-09-15
migrate to new Cython version
Chris Dyer
2014-06-03
fix for nonjoining chars
Chris Dyer
2014-04-02
moses conversion script
Chris Dyer
2014-03-18
chris edits
Chris Dyer
2014-03-12
XML file tokenization for all your WMT needs.
mjdenkowski
2014-03-10
few tokenization bugs
Chris Dyer
2014-02-27
ptb to normal
Chris Dyer
2014-02-20
Merge branch 'master' of https://github.com/redpony/cdec
armatthews
2014-02-20
slight beautification and more sane ordering
armatthews
2014-02-15
fix for missing angle quote form
Chris Dyer
2014-01-28
smarter script for adding <s> and </s> markers
Chris Dyer
2014-01-23
Reordered HTML entity blocks
armatthews
2014-01-23
Merged quote-norm with Greg's WMT normalization script
armatthews
2014-01-20
hindi months
Chris Dyer
2014-01-20
deal with acronyms in hindi
Chris Dyer
2014-01-20
hindi edits
Chris Dyer
2014-01-16
moar hindi
Chris Dyer
2014-01-15
deal with hindi
Chris Dyer
2013-12-12
Restore unbuffered functionality as option
mjdenkowski
2013-11-11
error on new macs
Chris Dyer
2013-09-11
Use bash instead of sh
mjdenkowski
2013-09-05
Slower but correct (wrt buffered) unbuffered version.
Michael Denkowski
2013-09-05
Unbuffered mode, flush after each line where possible, skip otherwise
Michael Denkowski
2013-09-04
Detokenizer
Michael Denkowski
2013-04-19
Merge branch 'master' of https://github.com/redpony/cdec
Chris Dyer
2013-04-19
hindi
Chris Dyer
2013-03-26
swahili abbreviations
Chris Dyer
2013-03-17
fix possible utf8 bug
Chris Dyer
2013-03-08
Merge branch 'master' of https://github.com/redpony/cdec
Chris Dyer
2013-03-08
few preproc fixes
Chris Dyer
2013-02-27
quick fix
Chris Dyer
2013-02-23
one missing quote type
Chris Dyer
2013-01-22
russian abbrevs
Chris Dyer
2013-01-21
tokenizer support for utf8 patterns
Chris Dyer
2013-01-21
a little bit of cleanup
Chris Dyer
2013-01-20
control max len
Chris Dyer
2013-01-19
updated version of boost.m4 and automatically build kenneth's LM builder
Chris Dyer
2013-01-15
corpus files
Chris Dyer
2012-12-05
slight tokenization bug fix
Chris Dyer
2012-12-05
remove logging, you should be using pv
Chris Dyer
2012-12-04
more flexible corpus cutting
Chris Dyer
2012-11-16
fix
Chris Dyer
2012-11-16
readme
Chris Dyer
2012-11-14
major mert clean up, stuff for simple system demo
Chris Dyer
2012-11-06
Merge branch 'master' of github.com:redpony/cdec
Chris Dyer
2012-11-06
add lowercase script
Chris Dyer
2012-11-05
script to add sos/eos
Chris Dyer
2012-10-25
add self translation
Chris Dyer
2012-07-28
script to paste files together with the triple pipe separator
Chris Dyer
2012-07-28
a couple of tools for cleaning corpora
Chris Dyer