1. word_pair_keys 
 group rules by source/target word pairs
 input is a cdec grammar (with int index), one rule per line

2. rules_cross_product
   build cross product of rules w/ same key
   input is output of 1

3. merge_rules
   mapred version of merge_rules.rb

NOTE
 cross product doesn't even work with g120:
   319078851 megabytes ~= 300 terabytes