add llh figure

git-svn-id: https://ws10smt.googlecode.com/svn/trunk@602 ec762483-ff6d-05da-a07a-a48fb63a330f
author: redpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f> 2010-08-20 03:53:10 +0000
committer: redpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f> 2010-08-20 03:53:10 +0000
commit: 4fb10dde2b81233b15d3476444d1fc7abed83c29 (patch)
tree: 2270955b9ea215a59483f4e1aaebdc5e96f72220 /report/np_clustering.tex
parent: dc0cb4d438d17ab19829f2fa1b6f6868cb876949 (diff)
1 files changed, 12 insertions, 5 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex
index 41afc4b9..55910b53 100644
--- a/report/np_clustering.tex
+++ b/report/np_clustering.tex
@@ -58,12 +58,19 @@ c_{-1}c_1 |& z & \sim \phi^{\textrm{\emph{inner}}}_z \\
 
 \section{Inference}
 
-Inference in the nonparametric clustering models was performed using Gibbs sampling \citep{geman:1984}, with the continuous parameters ($\theta_{\p}$, $\phi_z$, etc.) integrated out \citep{blunsom:2009}.  For the experiments reported below, we sampled for 1,000 iterations.  The initial state of the sampler was created by assigning every context in a phrase entirely to a random category.
-
-Values for the PYP hyperparameters were resampled after every 10 samples of the Gibbs sampler using the range doubling slice sampling technique \citep{neal:2000,johnson:2009}.
+Inference in the nonparametric clustering models was performed using Gibbs sampling \citep{geman:1984}, with the continuous parameters ($\theta_{\p}$, $\phi_z$, etc.) integrated out \citep{blunsom:2009}.  For the experiments reported below, we sampled for 1,000 iterations.  The initial state of the sampler was created by assigning every context in a phrase entirely to a random category. Values for the PYP hyperparameters were resampled after every 10 samples of the Gibbs sampler using the range doubling slice sampling technique \citep{neal:2000,johnson:2009}. Figure~\ref{fig:llh} shows the log-likelihood of the model measured after every 10 samples on an example run of the Urdu-English data with two different numbers of categories.
 
 The final sample drawn from the model was used to estimate $p(z|\textbf{c},\p)$, and each phrase occurrence was labelled with the $z$ that maximized this probability.
 
+\begin{figure}
+\begin{center}
+\includegraphics[scale=0.75]{pyp_clustering/llh.pdf}
+\vspace{-0.3cm}
+\end{center}
+\caption{Log-likelihood versus number of samples with 10 categories (red circles) and 25 categories (blue diamonds) on the Urdu data, 1 target word on either side, hierarchical $\theta_0$, uniform $\phi_0$.}
+\label{fig:llh}
+\end{figure}
+
 \section{Experiments}
 
 This section reports a number of experiments carried out to test the quality of the grammars learned using our nonparametric cluster models.  We evaluate them primarily in terms of their performance on translation tasks.  Translation quality evaluation is reported using case-insensitive \textsc{bleu} \citep{bleu} with the number of references used depending on the experimental condition (refer to details in the discussion of the corpora used below).
@@ -109,7 +116,7 @@ Random ($K=25$) & 55.4 & 19.7 \\
 Random ($K=50$) &  55.3 & 19.6 \\
 \hline
 Supervised \citep{samt} & 57.8 & 24.5 \\
-POS-only & TODO & 22.3 \\
+POS-only & 56.2 & 22.3 \\
 \end{tabular}
 \end{center}
 \label{tab:npbaselines}
@@ -127,7 +134,7 @@ Because the margin of improvement from the 1-category baseline to the supervised
 \hline
 Single category (baseline) & 57.0 & 21.1 \\
 \hline
-$K=10$ & 56.4 & \\
+$K=10$ & 56.4 & 21.2 \\
 $K=25$ & 57.5 & 22.0 \\
 $K=50$ & 56.2 & \\
 \end{tabular}
author	redpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>	2010-08-20 03:53:10 +0000
committer	redpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>	2010-08-20 03:53:10 +0000
commit	4fb10dde2b81233b15d3476444d1fc7abed83c29 (patch)
tree	2270955b9ea215a59483f4e1aaebdc5e96f72220 /report/np_clustering.tex
parent	dc0cb4d438d17ab19829f2fa1b6f6868cb876949 (diff)