The following was used to create the test data. The real inputs were corpus.fr, corpus.en, and corpus.aligned. The generated files were corpus.len_cats and fr-en.al.len. ./make_len_cats.pl corpus.en > corpus.len_cats ../merge_lines.pl corpus.fr corpus.en corpus.aligned corpus.len_cats > fr-en.al.len