From d523a48ff2a7097ec5c33054af82f9395774d2d2 Mon Sep 17 00:00:00 2001 From: desaicwtf Date: Mon, 16 Aug 2010 02:28:29 +0000 Subject: git-svn-id: https://ws10smt.googlecode.com/svn/trunk@559 ec762483-ff6d-05da-a07a-a48fb63a330f --- report/pr-clustering/EMVSPR.pdf | Bin 0 -> 58054 bytes report/pr-clustering/posterior.tex | 31 +++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) create mode 100644 report/pr-clustering/EMVSPR.pdf (limited to 'report') diff --git a/report/pr-clustering/EMVSPR.pdf b/report/pr-clustering/EMVSPR.pdf new file mode 100644 index 00000000..c03b41f2 Binary files /dev/null and b/report/pr-clustering/EMVSPR.pdf differ diff --git a/report/pr-clustering/posterior.tex b/report/pr-clustering/posterior.tex index 73c15dba..7597c8e1 100644 --- a/report/pr-clustering/posterior.tex +++ b/report/pr-clustering/posterior.tex @@ -191,3 +191,34 @@ where $\mathcal{L}_1$ and $\mathcal{L}_2$ are log-likelihood of two models. \section{Experiments} +As a sanity check, we looked at a few examples produced by +the basic model (EM) +and the posterior regularization (PR) model +with sparsity constraints. Table \ref{tab:EMVSPR} +shows a few examples. + +\begin{table}[h] + \centering + \includegraphics[width=3.5in]{pr-clustering/EMVSPR} + \caption[A few examples comparing EM and PR] + {A few examples comparing EM and PR. + Count of most frequent category shows how + many instances of a phrase are concetrated on + the single most frequent tag. + Number of categories shows how many categories + a phrase is labelled with. By experience as mentioned before, + we want a phrase to use fewer categories. + These numbers are fair indicators of sparsity. + } + \label{tab:EMVSPR} +\end{table} + +The models are formally evaluated with two kinds +of metrics. We feed the clustering output +through the whole translation pipeline +to obtain a BLEU score. We also came up +with an intrinsic evaluation of clustering quality +by comparing against a supervised CFG parser trained on the +tree bank. + + -- cgit v1.2.3