diff options
Diffstat (limited to 'report/np_clustering.tex')
-rw-r--r-- | report/np_clustering.tex | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex index 3ff605bd..c5992efd 100644 --- a/report/np_clustering.tex +++ b/report/np_clustering.tex @@ -195,6 +195,8 @@ Target POS-only ({\emph supervised}) & 1 & uniform & 22.2 & 1.85 \\ \subsection{Correlating the intrinsic metric} +The experiments discussed in the previous section offer an opportunity to correlate the proposed conditional entropy metric, which measures how well the induced grammar predicts the labels found in the supervised labeling of the corpus. Figure~\ref{fig:intr_correl} shows that there is a reasonable (inverse) correlation between the entropy of the predictive distribution and the \textsc{bleu} score on the test set. + \begin{figure} \begin{center} \includegraphics[scale=0.5]{pyp_clustering/correl.pdf} @@ -207,6 +209,8 @@ Target POS-only ({\emph supervised}) & 1 & uniform & 22.2 & 1.85 \\ \section{Discussion} +We now turn to an analysis of the results of the grammars learned using the nonparametric clustering models. + \subsection{Qualitative analysis of an example grammar} Tables~\ref{tab:npexample1} and \ref{tab:npexample2} show a fragment of a 25-category Urdu-English grammar learned using the nonparametric phrase clustering. Rules were selected that maximized the relative frequency $p(\textrm{RHS}|\textrm{LHS})$, filtering out the top 25 (to minimize the appearance of frequent words), and showing only rules consisting of terminal symbols in their right hand side (for clarity). The frequency of each rule type in a grammar filtered for the development set is also given. |