summaryrefslogtreecommitdiff
path: root/report/np_clustering.tex
diff options
context:
space:
mode:
authorredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-26 19:20:41 +0000
committerredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-26 19:20:41 +0000
commita90dfcca64e1e9efe651e0318a808d38b16c0579 (patch)
tree98859e17017d304a5322d57d9f268918967f52a6 /report/np_clustering.tex
parentf5dbba7e5e17ccb0b1f00e44e8262d8eed43647a (diff)
fixes
git-svn-id: https://ws10smt.googlecode.com/svn/trunk@623 ec762483-ff6d-05da-a07a-a48fb63a330f
Diffstat (limited to 'report/np_clustering.tex')
-rw-r--r--report/np_clustering.tex4
1 files changed, 4 insertions, 0 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex
index 3ff605bd..c5992efd 100644
--- a/report/np_clustering.tex
+++ b/report/np_clustering.tex
@@ -195,6 +195,8 @@ Target POS-only ({\emph supervised}) & 1 & uniform & 22.2 & 1.85 \\
\subsection{Correlating the intrinsic metric}
+The experiments discussed in the previous section offer an opportunity to correlate the proposed conditional entropy metric, which measures how well the induced grammar predicts the labels found in the supervised labeling of the corpus. Figure~\ref{fig:intr_correl} shows that there is a reasonable (inverse) correlation between the entropy of the predictive distribution and the \textsc{bleu} score on the test set.
+
\begin{figure}
\begin{center}
\includegraphics[scale=0.5]{pyp_clustering/correl.pdf}
@@ -207,6 +209,8 @@ Target POS-only ({\emph supervised}) & 1 & uniform & 22.2 & 1.85 \\
\section{Discussion}
+We now turn to an analysis of the results of the grammars learned using the nonparametric clustering models.
+
\subsection{Qualitative analysis of an example grammar}
Tables~\ref{tab:npexample1} and \ref{tab:npexample2} show a fragment of a 25-category Urdu-English grammar learned using the nonparametric phrase clustering. Rules were selected that maximized the relative frequency $p(\textrm{RHS}|\textrm{LHS})$, filtering out the top 25 (to minimize the appearance of frequent words), and showing only rules consisting of terminal symbols in their right hand side (for clarity). The frequency of each rule type in a grammar filtered for the development set is also given.