diff options
Diffstat (limited to 'report')
| -rw-r--r-- | report/pr-clustering/posterior.tex | 41 | 
1 files changed, 41 insertions, 0 deletions
| diff --git a/report/pr-clustering/posterior.tex b/report/pr-clustering/posterior.tex index 7597c8e1..ebb05211 100644 --- a/report/pr-clustering/posterior.tex +++ b/report/pr-clustering/posterior.tex @@ -221,4 +221,45 @@ with an intrinsic evaluation of clustering quality  by comparing against a supervised CFG parser trained on the  tree bank. +We are mainly working on Urdu-English language pair.  +Urdu has very  +different word ordering from English.  +This leaves us room for improvement over +phrase-based systems. +Here in Table \ref{tab:results}  +we show BLEU scores as well as +conditional entropy for each of the models above +on Urdu data. Conditional entropy is computed +as the entropy of ``gold'' labelling given +the predicted clustering. ``Gold'' labelling +distribution +is obtained from Collins parser +trained on Penn Treebank. Since not +all phrases are constituents, we ignored +phrases that don't correspond any constituents. +\begin{table}[h] +  \centering +  \begin{tabular}{ |*{3}{c|} } +    \hline +    model & BLEU & H(Gold$|$Predicted)\\ +        \hline +    hiero & 21.1 & 5.77\\ +     hiero+POS &  & \\ +     SAMT & & \\ +     EM & & \\ +     pr100 & & \\ +     agree-language & & \\ +    agree direction & &\\ +     non-parametric & & \\ +         \hline +  \end{tabular} +    \caption +  {Evaluation of PR models. +	Left column shows BLEU scores +	through the translation pipeline. +	Right columns shows conditional entropy +	of the  +    } +  \label{tab:results} +\end{table} | 
