summaryrefslogtreecommitdiff
path: root/report/np_clustering.tex
diff options
context:
space:
mode:
authorredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-18 22:46:29 +0000
committerredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-18 22:46:29 +0000
commit0a358b3a0d442e701ab50e5da7837da21038919e (patch)
tree92e73d33faa035d07705acfb0d4ad76ca2f0089d /report/np_clustering.tex
parent6186217f54676ffee3b26e25baf0aa8d524d241d (diff)
clean up structure
git-svn-id: https://ws10smt.googlecode.com/svn/trunk@597 ec762483-ff6d-05da-a07a-a48fb63a330f
Diffstat (limited to 'report/np_clustering.tex')
-rw-r--r--report/np_clustering.tex29
1 files changed, 25 insertions, 4 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex
index bb11f5e5..84cd857a 100644
--- a/report/np_clustering.tex
+++ b/report/np_clustering.tex
@@ -56,7 +56,7 @@ c_{-1}c_1 |& z & \sim \phi^{\textrm{\emph{inner}}}_z \\
\paragraph{Hyperparameter priors.} The hyperparameters of the PYPs in our models are treated as random variables whose values are inferred from the data and the priors used to characterize the values we expect them to take on. Since we have only a poor prior understanding about what their appropriate values should be, we use vague priors: discount parameters, $a_{(\cdot)}$, are drawn from a uniform Beta distribution ($a_{(\cdot)} \sim \textrm{Beta}(1,1)$) and concentration parameters, $b_{(\cdot)}$, are drawn from a Gamma distribution ($b_{(\cdot)} \sim \textrm{Gamma}(1,1)$).
-\subsection{Inference}
+\section{Inference}
Inference in the nonparametric clustering models was performed using Gibbs sampling \citep{geman:1984}, with the continuous parameters ($\theta_{\p}$, $\phi_z$, etc.) integrated out \citep{blunsom:2009}. For the experiments reported below, we sampled for 1,000 iterations. The initial state of the sampler was created by assigning every context in a phrase entirely to a random category.
@@ -135,6 +135,30 @@ $K=50$ & 56.2 & \\
\label{tab:npbaselines}
\end{table}%
+
+\subsection{Context types}
+
+\begin{table}[h]
+\caption{Effect of varying $K$, single word left and right target language context, uniform $\phi_0$, hierarchical $\theta_0$.}
+\begin{center}
+\begin{tabular}{r|c|c}
+& BTEC & Urdu \\
+\hline
+Single category (baseline) & 57.0 & 21.1 \\
+\hline
+1-word target & & \\
+1-word source & & \\
+2-words target & & \\
+2-words source & & \\
+\end{tabular}
+\end{center}
+\label{tab:npbaselines}
+\end{table}%
+
+
+
+\section{Discussion}
+
\subsection{Qualitative analysis of an example grammar}
Tables~\ref{tab:npexample1} and \ref{tab:npexample2} show a fragment of a 25-category Urdu-English grammar learned using the nonparametric phrase clustering. Rules were selected that maximized the relative frequency $p(\textrm{RHS}|\textrm{LHS})$, filtering out the top 25 (to minimize the appearance of frequent words), and showing only rules consisting of terminal symbols in their right hand side (for clarity). The frequency of each rule type in a grammar filtered for the development set is also given.
@@ -239,6 +263,3 @@ Tables~\ref{tab:npexample1} and \ref{tab:npexample2} show a fragment of a 25-cat
\end{table}%
-\subsection{Context types}
-
-