summaryrefslogtreecommitdiff
path: root/report/np_clustering.tex
diff options
context:
space:
mode:
authorredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-20 03:53:10 +0000
committerredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-20 03:53:10 +0000
commit4fb10dde2b81233b15d3476444d1fc7abed83c29 (patch)
tree2270955b9ea215a59483f4e1aaebdc5e96f72220 /report/np_clustering.tex
parentdc0cb4d438d17ab19829f2fa1b6f6868cb876949 (diff)
add llh figure
git-svn-id: https://ws10smt.googlecode.com/svn/trunk@602 ec762483-ff6d-05da-a07a-a48fb63a330f
Diffstat (limited to 'report/np_clustering.tex')
-rw-r--r--report/np_clustering.tex17
1 files changed, 12 insertions, 5 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex
index 41afc4b9..55910b53 100644
--- a/report/np_clustering.tex
+++ b/report/np_clustering.tex
@@ -58,12 +58,19 @@ c_{-1}c_1 |& z & \sim \phi^{\textrm{\emph{inner}}}_z \\
\section{Inference}
-Inference in the nonparametric clustering models was performed using Gibbs sampling \citep{geman:1984}, with the continuous parameters ($\theta_{\p}$, $\phi_z$, etc.) integrated out \citep{blunsom:2009}. For the experiments reported below, we sampled for 1,000 iterations. The initial state of the sampler was created by assigning every context in a phrase entirely to a random category.
-
-Values for the PYP hyperparameters were resampled after every 10 samples of the Gibbs sampler using the range doubling slice sampling technique \citep{neal:2000,johnson:2009}.
+Inference in the nonparametric clustering models was performed using Gibbs sampling \citep{geman:1984}, with the continuous parameters ($\theta_{\p}$, $\phi_z$, etc.) integrated out \citep{blunsom:2009}. For the experiments reported below, we sampled for 1,000 iterations. The initial state of the sampler was created by assigning every context in a phrase entirely to a random category. Values for the PYP hyperparameters were resampled after every 10 samples of the Gibbs sampler using the range doubling slice sampling technique \citep{neal:2000,johnson:2009}. Figure~\ref{fig:llh} shows the log-likelihood of the model measured after every 10 samples on an example run of the Urdu-English data with two different numbers of categories.
The final sample drawn from the model was used to estimate $p(z|\textbf{c},\p)$, and each phrase occurrence was labelled with the $z$ that maximized this probability.
+\begin{figure}
+\begin{center}
+\includegraphics[scale=0.75]{pyp_clustering/llh.pdf}
+\vspace{-0.3cm}
+\end{center}
+\caption{Log-likelihood versus number of samples with 10 categories (red circles) and 25 categories (blue diamonds) on the Urdu data, 1 target word on either side, hierarchical $\theta_0$, uniform $\phi_0$.}
+\label{fig:llh}
+\end{figure}
+
\section{Experiments}
This section reports a number of experiments carried out to test the quality of the grammars learned using our nonparametric cluster models. We evaluate them primarily in terms of their performance on translation tasks. Translation quality evaluation is reported using case-insensitive \textsc{bleu} \citep{bleu} with the number of references used depending on the experimental condition (refer to details in the discussion of the corpora used below).
@@ -109,7 +116,7 @@ Random ($K=25$) & 55.4 & 19.7 \\
Random ($K=50$) & 55.3 & 19.6 \\
\hline
Supervised \citep{samt} & 57.8 & 24.5 \\
-POS-only & TODO & 22.3 \\
+POS-only & 56.2 & 22.3 \\
\end{tabular}
\end{center}
\label{tab:npbaselines}
@@ -127,7 +134,7 @@ Because the margin of improvement from the 1-category baseline to the supervised
\hline
Single category (baseline) & 57.0 & 21.1 \\
\hline
-$K=10$ & 56.4 & \\
+$K=10$ & 56.4 & 21.2 \\
$K=25$ & 57.5 & 22.0 \\
$K=50$ & 56.2 & \\
\end{tabular}