diff options
author | mjdenkowski <michael.j.denkowski@gmail.com> | 2014-03-11 15:47:04 -0400 |
---|---|---|
committer | mjdenkowski <michael.j.denkowski@gmail.com> | 2014-03-11 15:47:04 -0400 |
commit | efbc43b40c8c3204245814b65a7be280498281bd (patch) | |
tree | f2b58b5f4208bf535522a788cfd241b59f7026ac /corpus/support/quote-norm.pl | |
parent | 1197fb64e67b95ed497df4ebca5dd69e3e2db1b5 (diff) | |
parent | 284383880f043edb2d67afbe2f64237c466245c1 (diff) |
Merge branch 'master' of github.com:redpony/cdec
Diffstat (limited to 'corpus/support/quote-norm.pl')
-rwxr-xr-x | corpus/support/quote-norm.pl | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/corpus/support/quote-norm.pl b/corpus/support/quote-norm.pl index 33604027..0366fad5 100755 --- a/corpus/support/quote-norm.pl +++ b/corpus/support/quote-norm.pl @@ -39,6 +39,7 @@ while(<STDIN>) { s/&\#([0-9]+);/pack("U", $1)/ge; # Regularlize spaces: + s/\x{ad}//g; # soft hyphen s/\x{a0}/ /g; # non-breaking space s/\x{2009}/ /g; # thin space s/\x{2028}/ /g; # "line separator" |