diff options
Diffstat (limited to 'report/pr-clustering/posterior.tex')
-rw-r--r-- | report/pr-clustering/posterior.tex | 30 |
1 files changed, 24 insertions, 6 deletions
diff --git a/report/pr-clustering/posterior.tex b/report/pr-clustering/posterior.tex index ebb05211..c4076cc3 100644 --- a/report/pr-clustering/posterior.tex +++ b/report/pr-clustering/posterior.tex @@ -52,7 +52,7 @@ P(z|\textbf{p},\textbf{c})=\frac{P(z,\textbf{c}|\textbf{p})} \] With the mechanisms to compute the posterior probabilities, we can apply EM to learn all the probabilities. -\section{Sparsity Constraints} +\section{Sparsity Constraints}\label{sec:pr-sparse} A common linguistic intuition we have about the phrase clustering problem is that a phrase should be put into very few categories, e.g. a verb phrase is unlikely to be used as @@ -115,8 +115,11 @@ The objective we want to optimize becomes: \] \[ \text{ s.t. }\forall \textbf{p},z, -E_q[\phi_{\textbf{p}iz}]\leq c_{\textbf{p}z}. +E_q[\phi_{\textbf{p}iz}]\leq c_{\textbf{p}z}, \] +where $\sigma$ is a constant to control +how strongly the soft constraint should +be enforced. Using Lagrange Multipliers, this objective can be optimized in its dual form, for each phrase $\textbf{p}$: @@ -137,7 +140,7 @@ q_i(z)\propto P_{\theta}(z|\textbf{p},\textbf{c}_i) \exp(\lambda_{\textbf{p}iz}). \] M-step can be performed as usual. -\section{Agreement Models} +\section{Agreement Models}\label{sec:pr-agree} Another type of constraint we used is agreement between different models. We came up with a similar generative model in the reverse direction to agree with @@ -248,9 +251,9 @@ phrases that don't correspond any constituents. hiero+POS & & \\ SAMT & & \\ EM & & \\ - pr100 & & \\ - agree-language & & \\ - agree direction & &\\ + PR $\sigma=100$ & & \\ + agree language & & \\ + agree direction & &\\ non-parametric & & \\ \hline \end{tabular} @@ -263,3 +266,18 @@ phrases that don't correspond any constituents. } \label{tab:results} \end{table} + +In Table \ref{tab:results}, hiero is hierachical phrase-based +model with 1 category in all of its SCFG rules. Hiero+POS +is hiero with all words labelled with their POS tags. +SAMT is a syntax based system with a supervised +parser trained on Treebank. EM is the first model mentioned +in the beginning of this chapter. PR $\sigma=100$ is +posterior regularization model with sparsity constraint +explained in Section \ref{sec:pr-sparse}. +$\sigma$ is the constant controls strongness of the constraint. +Agree language and agree direction are models with agreement +constraints mentioned in Section \ref{sec:pr-agree}. Non-parametric +is non-parametric model introduced in the previous chapter. + +\section{Conclusion} |