diff options
Diffstat (limited to 'report')
-rw-r--r-- | report/np_clustering.tex | 20 |
1 files changed, 11 insertions, 9 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex index 0926a84e..1d36d6f5 100644 --- a/report/np_clustering.tex +++ b/report/np_clustering.tex @@ -14,30 +14,32 @@ To encode these prior beliefs, we make use of Pitman-Yor processes \citep{pitman Our models assume a fixed number of categories, $K$. The category type, $z \in \{ 1 , 2 , \ldots , K \}$, is generated from a PYP with a uniform base distribution: \begin{align*} z &| \p & \sim \theta_{\p} \\ -\theta_p &| a_{\p},b_{\p},K & \sim \textrm{PYP}(a_{\p},b_{\p},\textrm{Uniform}(K)) +\theta_{\p} &| a_{\p},b_{\p},K & \sim \textrm{PYP}(a_{\p},b_{\p},\frac{1}{K}) \end{align*} \noindent Alternatively, we used hierarchical PYP process which shares statistics about the use of categories across phrases: \begin{align*} z &| \p & \sim \theta_{\p} \\ \theta_{\p} &| a_{\p},b_{\p} & \sim \textrm{PYP}(a_{\p},b_{\p},\theta_0) \\ -\theta_0 &| a_0,b_0,K & \sim \textrm{PYP}(a_0,b_0,\textrm{Uniform}(K)) +\theta_0 &| a_0,b_0,K & \sim \textrm{PYP}(a_0,b_0,\frac{1}{K}) \end{align*} \noindent Each category $z$ token then generates the context $\textbf{c}_i$. We again model this using a PYP, which will tend to cluster commonly used contexts across phrases into a single category. Additionally, by using hierarchical PYPs, we can smooth highly specific contexts by backing off to less specific contexts (e.g., composed of fewer words or word classes). -The most basic version of our model uses a uniform prior base distribution over contexts: +The most basic version of our model uses a uniform base distribution over contexts. This model was most useful when generating contexts consisting of a single word or word class (i.e., $\textbf{c}=c_{-1}c_1$) in either the source or target language on either side. \begin{align*} -\textbf{c} |& z & \sim \phi_z \\ -\phi_z |& a_z,b_z & \sim \textrm{PYP}(a_z,b_z,\textrm{Uniform}(|V|^2)) +c_{-1}c_1 |& z & \sim \phi_z \\ +\phi_z |& a_z,b_z & \sim \textrm{PYP}(a_z,b_z,\frac{1}{|V|^2}) \end{align*} -\noindent TODO. For contexts with more than a single word on either side, we typically backed off from a +\noindent When larger contexts were used, the space of these contexts becomes very sparse, so another variant of our model uses a non-uniform base distribution to back off to the probability of generating a smaller context (i.e., $c_{-1}c_1$) as above and then generating the outer context \begin{align*} -\textbf{c} |& z & \sim \phi_z \\ -\phi_z |& a_z,b_z, \phi_0 & \sim \textrm{PYP}(a_z,b_z,\phi_0(\cdot|z) \times \textrm{Uniform}(|V|^2)) \\ -\phi_0 |& a_0,b_0 & \sim \textrm{PYP}(a_z,b_z,\phi_0(\cdot|z)) +c_{-2}c_{-1}c_1c_2 |& z & \sim \phi_z \\ +\phi_z |& a_z,b_z & \sim \textrm{PYP}(a_z,b_z,P_1(\cdot|z)) \\ +&P_1(c_{-2}c_{-1}c_1c_2|z)& = \phi^{\textrm{\emph{inner}}}_z(c_{-1}c_1|z) \times \frac{1}{|V|^2} \\ +c_{-1}c_1 |& z & \sim \phi^{\textrm{\emph{inner}}}_z \\ +\phi^{\textrm{\emph{inner}}}_z |& a^{\textrm{\emph{inner}}}_z,b^{\textrm{\emph{inner}}}_z & \sim \textrm{PYP}(a^{\textrm{\emph{inner}}}_z,b^{\textrm{\emph{inner}}}_z,\frac{1}{|V|^2}) \end{align*} \noindent Figure~\ref{fig:np_plate} shows a plate diagram for the model. |