feats

git-svn-id: https://ws10smt.googlecode.com/svn/trunk@603 ec762483-ff6d-05da-a07a-a48fb63a330f
author: redpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f> 2010-08-20 04:29:15 +0000
committer: redpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f> 2010-08-20 04:29:15 +0000
commit: cf9f61182b9c6d8d984f8f473c9a2b80feba597d (patch)
tree: edcef17d4ec1195f65958830124e4fb8d6c3d073 /report/np_clustering.tex
parent: 4fb10dde2b81233b15d3476444d1fc7abed83c29 (diff)
1 files changed, 16 insertions, 0 deletions
diff --git a/report/np_clustering.tex b/report/np_clustering.tex
index 55910b53..770a7da3 100644
--- a/report/np_clustering.tex
+++ b/report/np_clustering.tex
@@ -124,6 +124,22 @@ POS-only & 56.2 & 22.3 \\
 
 Because the margin of improvement from the 1-category baseline to the supervised condition is much more substantial in the Urdu-English condition than in the BTEC condition, some experiments were only carried out on Urdu.
 
+\subsection{Features in the multi-category systems}
+
+The features used in the baseline system to evaluate translation hypotheses were generalized to exploit the presence of category labels.  In addition to the language model and word penalty, we made use of the following features to score each rule $\textrm{Y} \rightarrow \langle \textbf{f},\textbf{e} \rangle$ in a derivation.
+
+\begin{enumerate}
+\item The lexical translation probability of the words in both phrases, $\textrm{{\emph lex}}(\textbf{e}|\textbf{f})$, as defined in \cite{Koehn2003}.
+\item The inverse lexical translation probability, $\textrm{{\emph lex}}(\textbf{f}|\textbf{e})$.
+\item The frequency of occurrence of the LHS category, $f(\textrm{Y})$.
+\item The relative frequency of \textbf{e} given \textbf{f}, collapsing all non-terminals into the symbol X, $f_{\textbf{X}}(\textbf{e}|\textbf{f})$. This is equivalent to the relative frequency of the rule in the 1-category `Hiero' grammar.
+\item The inverse relative frequency, $f_{\textbf{X}}(\textbf{f}|\textbf{e})$.
+\item The relative frequency of $\langle \textbf{f}, \textbf{e} \rangle$ given Y, $f(\textbf{f}, \textbf{e} | \textrm{Y})$.
+\item The log rule count, $\log C(\textrm{Y} \rightarrow \langle \textbf{f},\textbf{e} \rangle)$.
+\item A feature with value 1 (creates a count of the number of rules in the derivation).
+\end{enumerate}
+
+
 \subsection{Number of categories}
 
 \begin{table}[h]
author	redpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>	2010-08-20 04:29:15 +0000
committer	redpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>	2010-08-20 04:29:15 +0000
commit	cf9f61182b9c6d8d984f8f473c9a2b80feba597d (patch)
tree	edcef17d4ec1195f65958830124e4fb8d6c3d073 /report/np_clustering.tex
parent	4fb10dde2b81233b15d3476444d1fc7abed83c29 (diff)