summaryrefslogtreecommitdiff
path: root/report/training.tex
diff options
context:
space:
mode:
authorredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-11 23:27:00 +0000
committerredpony <redpony@ec762483-ff6d-05da-a07a-a48fb63a330f>2010-08-11 23:27:00 +0000
commitd4b52e01c1ad69ba69931be5af2300a4e63039a0 (patch)
tree5ce9dfb0abe06e0b8d46c1bb478e3d0fccd641bb /report/training.tex
parent23c362c8d0952b9fa40e559f5045744a6b289d25 (diff)
some writing
git-svn-id: https://ws10smt.googlecode.com/svn/trunk@530 ec762483-ff6d-05da-a07a-a48fb63a330f
Diffstat (limited to 'report/training.tex')
-rw-r--r--report/training.tex5
1 files changed, 4 insertions, 1 deletions
diff --git a/report/training.tex b/report/training.tex
index 96ee70b7..1f07db54 100644
--- a/report/training.tex
+++ b/report/training.tex
@@ -3,7 +3,10 @@
An integral part of constructing a state-of-the-art machine translation system is the training procedure. The goal of training is to optimize the model parameters to maximize translation quality on some metric, where the parameters are the weights associated with the features we use in our model, and the metric is BLEU.
The most common approach to training is Minimum Error Rate Training (MERT), which tunes the parameters to minimize error according to an arbitrary error function. Thus, in our case this is equivalent to saying that it maximizes the 1-best translation under the BLEU metric. MERT is a log-linear model which allows us to combine different features in order to find the best target translation $e*$ for a input source $f$:
-$$e* = \argmax_e p(e|f) = argmax_e \sum_{k=1}^K \w_k\h_k(e,f)$$
+
+\begin{equation}
+e^* = \arg \max_e p(e|f) = argmax_e \sum_{k=1}^K w_kh_k(e,f)
+\end{equation}
where $h_k(e,f)$ is a feature associated with the translation of $f$ to $e$, and $w$ is the weight associated with that feature. Unfortunately, MERT has been empirically unable to extend beyond optimization of a handful of features, thus necessecitating dense features. Theses features typically include: