summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-26 19:20:41 +0000
committerredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-26 19:20:41 +0000
commita90dfcca64e1e9efe651e0318a808d38b16c0579 (patch)
tree98859e17017d304a5322d57d9f268918967f52a6
parentf5dbba7e5e17ccb0b1f00e44e8262d8eed43647a (diff)
fixes
git-svn-id: https://ws10smt.googlecode.com/svn/trunk@623 ec762483-ff6d-05da-a07a-a48fb63a330f
-rw-r--r--report/np_clustering.tex4
-rw-r--r--utils/sampler.h3
2 files changed, 7 insertions, 0 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex
index 3ff605bd..c5992efd 100644
--- a/report/np_clustering.tex
+++ b/report/np_clustering.tex
@@ -195,6 +195,8 @@ Target POS-only ({\emph supervised}) & 1 & uniform & 22.2 & 1.85 \\
\subsection{Correlating the intrinsic metric}
+The experiments discussed in the previous section offer an opportunity to correlate the proposed conditional entropy metric, which measures how well the induced grammar predicts the labels found in the supervised labeling of the corpus. Figure~\ref{fig:intr_correl} shows that there is a reasonable (inverse) correlation between the entropy of the predictive distribution and the \textsc{bleu} score on the test set.
+
\begin{figure}
\begin{center}
\includegraphics[scale=0.5]{pyp_clustering/correl.pdf}
@@ -207,6 +209,8 @@ Target POS-only ({\emph supervised}) & 1 & uniform & 22.2 & 1.85 \\
\section{Discussion}
+We now turn to an analysis of the results of the grammars learned using the nonparametric clustering models.
+
\subsection{Qualitative analysis of an example grammar}
Tables~\ref{tab:npexample1} and \ref{tab:npexample2} show a fragment of a 25-category Urdu-English grammar learned using the nonparametric phrase clustering. Rules were selected that maximized the relative frequency $p(\textrm{RHS}|\textrm{LHS})$, filtering out the top 25 (to minimize the appearance of frequent words), and showing only rules consisting of terminal symbols in their right hand side (for clarity). The frequency of each rule type in a grammar filtered for the development set is also given.
diff --git a/utils/sampler.h b/utils/sampler.h
index f75d96b6..a14f6e2f 100644
--- a/utils/sampler.h
+++ b/utils/sampler.h
@@ -61,6 +61,9 @@ struct RandomNumberGenerator {
// draw a value from U(0,1)
double next() {return m_random();}
+ // draw a value from U(0,1)
+ double operator()() { return m_random(); }
+
// draw a value from N(mean,var)
double NextNormal(double mean, double var) {
return boost::normal_distribution<double>(mean, var)(m_random);