diff options
-rw-r--r-- | report/np_clustering.tex | 29 |
1 files changed, 25 insertions, 4 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex index bb11f5e5..84cd857a 100644 --- a/report/np_clustering.tex +++ b/report/np_clustering.tex @@ -56,7 +56,7 @@ c_{-1}c_1 |& z & \sim \phi^{\textrm{\emph{inner}}}_z \\ \paragraph{Hyperparameter priors.} The hyperparameters of the PYPs in our models are treated as random variables whose values are inferred from the data and the priors used to characterize the values we expect them to take on. Since we have only a poor prior understanding about what their appropriate values should be, we use vague priors: discount parameters, $a_{(\cdot)}$, are drawn from a uniform Beta distribution ($a_{(\cdot)} \sim \textrm{Beta}(1,1)$) and concentration parameters, $b_{(\cdot)}$, are drawn from a Gamma distribution ($b_{(\cdot)} \sim \textrm{Gamma}(1,1)$). -\subsection{Inference} +\section{Inference} Inference in the nonparametric clustering models was performed using Gibbs sampling \citep{geman:1984}, with the continuous parameters ($\theta_{\p}$, $\phi_z$, etc.) integrated out \citep{blunsom:2009}. For the experiments reported below, we sampled for 1,000 iterations. The initial state of the sampler was created by assigning every context in a phrase entirely to a random category. @@ -135,6 +135,30 @@ $K=50$ & 56.2 & \\ \label{tab:npbaselines} \end{table}% + +\subsection{Context types} + +\begin{table}[h] +\caption{Effect of varying $K$, single word left and right target language context, uniform $\phi_0$, hierarchical $\theta_0$.} +\begin{center} +\begin{tabular}{r|c|c} +& BTEC & Urdu \\ +\hline +Single category (baseline) & 57.0 & 21.1 \\ +\hline +1-word target & & \\ +1-word source & & \\ +2-words target & & \\ +2-words source & & \\ +\end{tabular} +\end{center} +\label{tab:npbaselines} +\end{table}% + + + +\section{Discussion} + \subsection{Qualitative analysis of an example grammar} Tables~\ref{tab:npexample1} and \ref{tab:npexample2} show a fragment of a 25-category Urdu-English grammar learned using the nonparametric phrase clustering. Rules were selected that maximized the relative frequency $p(\textrm{RHS}|\textrm{LHS})$, filtering out the top 25 (to minimize the appearance of frequent words), and showing only rules consisting of terminal symbols in their right hand side (for clarity). The frequency of each rule type in a grammar filtered for the development set is also given. @@ -239,6 +263,3 @@ Tables~\ref{tab:npexample1} and \ref{tab:npexample2} show a fragment of a 25-cat \end{table}% -\subsection{Context types} - - |