summaryrefslogtreecommitdiff
path: root/report/np_clustering.tex
diff options
context:
space:
mode:
authorredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-13 17:59:37 +0000
committerredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-13 17:59:37 +0000
commitaf38acf3c24273bdc533967fe1260fd109b2df98 (patch)
treebbba6cdb829fc92cf74c907c4bd5699b2860b42e /report/np_clustering.tex
parent12a546fcd6a48eeb5e1574a1e1b01843fe0a5d7b (diff)
model complete
git-svn-id: https://ws10smt.googlecode.com/svn/trunk@543 ec762483-ff6d-05da-a07a-a48fb63a330f
Diffstat (limited to 'report/np_clustering.tex')
-rw-r--r--report/np_clustering.tex20
1 files changed, 11 insertions, 9 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex
index 0926a84e..1d36d6f5 100644
--- a/report/np_clustering.tex
+++ b/report/np_clustering.tex
@@ -14,30 +14,32 @@ To encode these prior beliefs, we make use of Pitman-Yor processes \citep{pitman
Our models assume a fixed number of categories, $K$. The category type, $z \in \{ 1 , 2 , \ldots , K \}$, is generated from a PYP with a uniform base distribution:
\begin{align*}
z &| \p & \sim \theta_{\p} \\
-\theta_p &| a_{\p},b_{\p},K & \sim \textrm{PYP}(a_{\p},b_{\p},\textrm{Uniform}(K))
+\theta_{\p} &| a_{\p},b_{\p},K & \sim \textrm{PYP}(a_{\p},b_{\p},\frac{1}{K})
\end{align*}
\noindent Alternatively, we used hierarchical PYP process which shares statistics about the use of categories across phrases:
\begin{align*}
z &| \p & \sim \theta_{\p} \\
\theta_{\p} &| a_{\p},b_{\p} & \sim \textrm{PYP}(a_{\p},b_{\p},\theta_0) \\
-\theta_0 &| a_0,b_0,K & \sim \textrm{PYP}(a_0,b_0,\textrm{Uniform}(K))
+\theta_0 &| a_0,b_0,K & \sim \textrm{PYP}(a_0,b_0,\frac{1}{K})
\end{align*}
\noindent Each category $z$ token then generates the context $\textbf{c}_i$. We again model this using a PYP, which will tend to cluster commonly used contexts across phrases into a single category. Additionally, by using hierarchical PYPs, we can smooth highly specific contexts by backing off to less specific contexts (e.g., composed of fewer words or word classes).
-The most basic version of our model uses a uniform prior base distribution over contexts:
+The most basic version of our model uses a uniform base distribution over contexts. This model was most useful when generating contexts consisting of a single word or word class (i.e., $\textbf{c}=c_{-1}c_1$) in either the source or target language on either side.
\begin{align*}
-\textbf{c} |& z & \sim \phi_z \\
-\phi_z |& a_z,b_z & \sim \textrm{PYP}(a_z,b_z,\textrm{Uniform}(|V|^2))
+c_{-1}c_1 |& z & \sim \phi_z \\
+\phi_z |& a_z,b_z & \sim \textrm{PYP}(a_z,b_z,\frac{1}{|V|^2})
\end{align*}
-\noindent TODO. For contexts with more than a single word on either side, we typically backed off from a
+\noindent When larger contexts were used, the space of these contexts becomes very sparse, so another variant of our model uses a non-uniform base distribution to back off to the probability of generating a smaller context (i.e., $c_{-1}c_1$) as above and then generating the outer context
\begin{align*}
-\textbf{c} |& z & \sim \phi_z \\
-\phi_z |& a_z,b_z, \phi_0 & \sim \textrm{PYP}(a_z,b_z,\phi_0(\cdot|z) \times \textrm{Uniform}(|V|^2)) \\
-\phi_0 |& a_0,b_0 & \sim \textrm{PYP}(a_z,b_z,\phi_0(\cdot|z))
+c_{-2}c_{-1}c_1c_2 |& z & \sim \phi_z \\
+\phi_z |& a_z,b_z & \sim \textrm{PYP}(a_z,b_z,P_1(\cdot|z)) \\
+&P_1(c_{-2}c_{-1}c_1c_2|z)& = \phi^{\textrm{\emph{inner}}}_z(c_{-1}c_1|z) \times \frac{1}{|V|^2} \\
+c_{-1}c_1 |& z & \sim \phi^{\textrm{\emph{inner}}}_z \\
+\phi^{\textrm{\emph{inner}}}_z |& a^{\textrm{\emph{inner}}}_z,b^{\textrm{\emph{inner}}}_z & \sim \textrm{PYP}(a^{\textrm{\emph{inner}}}_z,b^{\textrm{\emph{inner}}}_z,\frac{1}{|V|^2})
\end{align*}
\noindent Figure~\ref{fig:np_plate} shows a plate diagram for the model.