summaryrefslogtreecommitdiff
path: root/mteval/README.protocol
blob: f01d2e842d88c63727a14f5ed1ccefba9eba25e5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
TEXT PROTOCOL FOR EXTERNAL EVALUATION CODE

External evaluators may be supplied that use a simple text-based protocol
that reads commands on STDIN and writes the responses to STDOUT. Commands
and responses are newline (\n) delimited lines. Important: the evaluator
process must flush output after processing each line of input.

The evaluator must respond to two kinds of messages: SCORE and EVAL, named
after the first field.


1. SCORE messages

A SCORE message includes a set of one or more
reference translations of a segment as well as a hypothesis translation of
the same segment and indicates the evaluator should return a vector of sufficient
statistics.

  Examples:
   SCORE ||| this is reference 1 ||| this is reference 2 ||| this is reference 3 ||| this is the hypothesis
   SCORE ||| this is a single reference . ||| here is the hypothesis !

1.1. SCORE response

The response to a score message is a vector of floats representing the
sufficient statistics. *The framework code assumes that sufficient statistics
linearly decompose across hypothesis*, that is, that they may be vector
added. Furthermore, a single evaluator must always return the same
number of values, since each position in the vector is assumed to have a fixed
semantics. (For example, a BLEU evaluator might define position to be the
counts of 1-gram hits.)

  Examples responses:
    8 6 3 2 10 10 10 10 12.7 10
    -2 1.32421 54 3 -1.2e-13


2. EVAL messages

An EVAL message requests that the evaluator convert a vector of sufficient
statistics into a scalar metric (typically between 0 and 1, but this is not
enforced).  The order of the sufficient statistics will be the same 
  Examples:
    EVAL ||| 8 6 3 2 10 10 10 10 12.7 10
    EVAL ||| 0 0 -2 1.32 0

2.1 EVAL response

The eval response is a single float value. Output must be flushed after
writing it.

  Example responses:
    0.67
    0.445323324
    0
    1.245e-12