diff options
author | desaicwtf <desaicwtf@ec762483-ff6d-05da-a07a-a48fb63a330f> | 2010-10-30 21:01:03 +0000 |
---|---|---|
committer | desaicwtf <desaicwtf@ec762483-ff6d-05da-a07a-a48fb63a330f> | 2010-10-30 21:01:03 +0000 |
commit | cd7562fde01771d461350cf91b383021754ea27b (patch) | |
tree | 3b563ac38fce8d0cce3107e6aa10640be049c8e8 /report | |
parent | 0da3df9b23f8fe588e8662f5be4ba4101bf0a8d4 (diff) |
added conclusion
git-svn-id: https://ws10smt.googlecode.com/svn/trunk@704 ec762483-ff6d-05da-a07a-a48fb63a330f
Diffstat (limited to 'report')
-rw-r--r-- | report/pr-clustering/posterior.tex | 20 |
1 files changed, 19 insertions, 1 deletions
diff --git a/report/pr-clustering/posterior.tex b/report/pr-clustering/posterior.tex index ea8560c1..8a199e63 100644 --- a/report/pr-clustering/posterior.tex +++ b/report/pr-clustering/posterior.tex @@ -183,7 +183,12 @@ for each phrase $\textbf{p}$: \sum_i \lambda_{\textbf{p}iz}\leq \sigma. \] This dual objective can be optimized with projected gradient -descent. +descent. Notice that each phrase has its own objective and +constraint. The $\lambda$s are not shared acrossed +phrases. Therefore we can optimize the objective +for each phrase separately. It is convenient for parallelizing +the algorithm. It also makes the objective easier to optimize. + The $q$ distribution we are looking for is then \[ q_i(z)\propto P_{\theta}(z|\textbf{p},\textbf{c}_i) @@ -382,3 +387,16 @@ Agree language and agree direction are models with agreement constraints mentioned in Section \ref{sec:pr-agree}. Non-parametric is non-parametric model introduced in the previous chapter. \section{Conclusion} +The posterior regularization framework has a solid theoretical foundation. +It is shown mathematically to balance between constraint and likelihood. +In our experiments, +we used it to enforce sparsity constraint and agreement constraint and +achieved results comparable to non-parametric method that enforces +sparcity through priors. The algorithm is fairly fast if the constraint +can be decomposed into smaller pieces and compute separately. In our case, +the sparsity constraint for phrases can be decomposed into one small optimization +procedure for each phrase. In practice, our algorithm is much +faster than non-parametric models with Gibbs sampling inference. +The agreement +models are even faster because they are performing almost the same amount +of computation as the simple models trained with EM. |