summaryrefslogtreecommitdiff
path: root/phrasinator/README
blob: fb5b93ef5c9c2963174a49c0812ea86efded51d9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
The "phrasinator" uses a simple Bayesian nonparametric model to segment
text into chunks. The inferred model is then saved so that it can rapidly
predict segments in new (but related) texts.

  Input will be a corpus of sentences, e.g.:

    economists have argued that real interest rates have fallen .

  The output will be a model that, when run with cdec, will produce
  a segmentation into phrasal units, e.g.:

    economists have argued that real_interest_rates have fallen .


To train a model, run ./train-phrasinator.pl and follow instructions.