index
:
cdec-dtrain
cmake
master
sa_mmap
word-alignment
Mirror of https://github.com/pks/cdec-dtrain.git
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
corpus
Age
Commit message (
Collapse
)
Author
2016-01-03
corpus stats script
Chris Dyer
2015-06-06
small fixes
Chris Dyer
2015-05-21
deal with curly quotes
Chris Dyer
2015-04-14
Parallel tokenization
mjdenkowski
2015-04-13
Moses compatibility for tokenizer
Michael Denkowski
2015-01-08
Stop BOMbs before they decrease quality
Kenneth Heafield
2014-12-29
deal with eur symbol
Chris Dyer
2014-12-29
finnish case markings
Chris Dyer
2014-12-29
foo
Chris Dyer
2014-12-29
foo
Chris Dyer
2014-12-29
finnish abbrevs
Chris Dyer
2014-12-20
Generalize to sample any number of dev sets
mjdenkowski
2014-12-19
Sample dev and test sets with pseudo-documents
mjdenkowski
2014-10-25
bit more info
Chris Dyer
2014-10-24
conll2cdec conversion
Chris Dyer
2014-09-28
add error message
Chris Dyer
2014-09-15
migrate to new Cython version
Chris Dyer
2014-06-03
fix for nonjoining chars
Chris Dyer
2014-04-02
moses conversion script
Chris Dyer
2014-03-18
chris edits
Chris Dyer
2014-03-12
XML file tokenization for all your WMT needs.
mjdenkowski
2014-03-10
few tokenization bugs
Chris Dyer
2014-02-27
ptb to normal
Chris Dyer
2014-02-20
Merge branch 'master' of https://github.com/redpony/cdec
armatthews
2014-02-20
slight beautification and more sane ordering
armatthews
2014-02-15
fix for missing angle quote form
Chris Dyer
2014-01-28
smarter script for adding <s> and </s> markers
Chris Dyer
2014-01-23
Reordered HTML entity blocks
armatthews
2014-01-23
Merged quote-norm with Greg's WMT normalization script
armatthews
2014-01-20
hindi months
Chris Dyer
2014-01-20
deal with acronyms in hindi
Chris Dyer
2014-01-20
hindi edits
Chris Dyer
2014-01-16
moar hindi
Chris Dyer
2014-01-15
deal with hindi
Chris Dyer
2013-12-12
Restore unbuffered functionality as option
mjdenkowski
2013-11-11
error on new macs
Chris Dyer
2013-09-11
Use bash instead of sh
mjdenkowski
2013-09-05
Slower but correct (wrt buffered) unbuffered version.
Michael Denkowski
2013-09-05
Unbuffered mode, flush after each line where possible, skip otherwise
Michael Denkowski
2013-09-04
Detokenizer
Michael Denkowski
2013-04-19
Merge branch 'master' of https://github.com/redpony/cdec
Chris Dyer
2013-04-19
hindi
Chris Dyer
2013-03-26
swahili abbreviations
Chris Dyer
2013-03-17
fix possible utf8 bug
Chris Dyer
2013-03-08
Merge branch 'master' of https://github.com/redpony/cdec
Chris Dyer
2013-03-08
few preproc fixes
Chris Dyer
2013-02-27
quick fix
Chris Dyer
2013-02-23
one missing quote type
Chris Dyer
2013-01-22
russian abbrevs
Chris Dyer
2013-01-21
tokenizer support for utf8 patterns
Chris Dyer
[next]