summaryrefslogtreecommitdiff
path: root/report/introduction.tex
diff options
context:
space:
mode:
Diffstat (limited to 'report/introduction.tex')
-rw-r--r--report/introduction.tex53
1 files changed, 31 insertions, 22 deletions
diff --git a/report/introduction.tex b/report/introduction.tex
index 3b673c8e..12cc2705 100644
--- a/report/introduction.tex
+++ b/report/introduction.tex
@@ -81,7 +81,7 @@ We structured the workshop into three parallel but interdependent streams:
\begin{figure}
\centering
- \subfigure{\includegraphics[scale=0.5]{intro_slides/JeNeVeuxPasTravailler-hiero-labelled.pdf}}
+ \includegraphics[scale=0.5]{intro_slides/JeNeVeuxPasTravailler-hiero-labelled.pdf}
\caption{Example derivation using the Hiero grammar extraction heuristics where non-terminals have been clustered into unsupervised syntactic categories denoted by $X?$.}
\label{fig:intro_labelled_hiero}
\end{figure}
@@ -124,31 +124,24 @@ Chapter \ref{chap:training} describes this work.
The remainder of this introductory chapter provides a formal definition of SCFGs and describes the language pairs that we experimented with.
\section{Synchronous context free grammar} \label{sec:scfg}
+{\em This section will be moved to the start of Chapter 2}
+
+%\subsubsection*{Synchronous context free grammar} \label{sec:scfg}
+\begin{figure}[t]
+\begin{center}
+\includegraphics[width=0.6\columnwidth]{example_derivation2.pdf}
+\end{center}
+\caption[Derivation]{An example SCFG derivation from a Chinese source sentence which yields the English sentence: {\em ``Brown arrived in Shanghai from Beijing late last night.''}. The non-terminal alignment $\mathbf{a}$ is specified by the variable subscripts.}
+\label{fig:intro_example_derivation}
+\end{figure}
+
+The translation models discussed explored in this workshop are based on synchronous grammars.
+Here we provide a short definition of the formalism we've employed: synchronous context free grammar (SCFG).
A synchronous context free grammar (SCFG, \cite{lewis68scfg}) generalizes context-free grammars to generate strings concurrently in two (or more) languages. A string pair is generated by applying a series of paired rewrite rules of the form, $X \rightarrow \langle \mathbf{e}, \mathbf{f}, \mathbf{a} \rangle$, where $X$ is a non-terminal, $\mathbf{e}$ and $\mathbf{f}$ are strings of terminals and non-terminals and $\mathbf{a}$ specifies a one-to-one alignment between non-terminals in $\mathbf{e}$ and $\mathbf{f}$.
In the context of SMT, by assigning the source and target languages to the respective sides of a probabilistic SCFG it is possible to describe translation as the process of parsing the source sentence, which induces a parallel tree structure and translation in the target language \cite{chiang07hierarchical}.
Terminal are rewritten as pairs of strings of terminal symbols in the source and target languages. Additionally, one side of a terminal expansion may be the special symbol $\epsilon$, which indicates a null alignment which permits arbitrary insertions and deletions.
-Figure \ref{fig:scfg} shows an example derivation for Japanese to English translation using an SCFG.
-
-\begin{figure}Grammar fragment:
-\begin{eqnarray*}
-\label{rule:discont}X & \rightarrow & \langle \nt{X}{1}\ \nt{X}{2}\ \nt{X}{3},\ \nt{X}{1}\ \nt{X}{3}\ \nt{X}{2} \rangle \\
-X & \rightarrow & \langle \textrm{\emph{John-ga}},\ \textrm{\emph{John}} \rangle \\
-X & \rightarrow & \langle \textrm{\emph{ringo-o}},\ \textrm{\emph{an apple}} \rangle \\
-X & \rightarrow & \langle \textrm{\emph{tabeta}},\ \textrm{\emph{ate}} \rangle
-\end{eqnarray*}
-Sample derivation:
-\begin{eqnarray*}
-\label{derivationt}
-& &\langle \nt{S}{1},\nt{S}{1} \rangle \Rightarrow \langle \nt{X}{2},\ \nt{X}{2} \rangle \\
- & \Rightarrow& \langle \nt{X}{3}\ \nt{X}{4}\ \nt{X}{5},\ \nt{X}{3}\ \nt{X}{5}\ \nt{X}{4} \rangle \\
- & \Rightarrow &\langle \textrm{\emph{John-ga}}\ \nt{X}{4}\ \nt{X}{5},\ \textrm{\emph{John}}\ \nt{X}{5}\ \nt{X}{4} \rangle \\
- & \Rightarrow &\langle \textrm{\emph{John-ga}}\ \textrm{\emph{ringo-o}}\ \nt{X}{5},\ \textrm{\emph{John}}\ \nt{X}{5}\ \textrm{\emph{an apple}} \rangle \\
- & \Rightarrow &\langle \textrm{\emph{John-ga ringo-o tabeta}},\ \textrm{\emph{John ate an apple}} \rangle
-\end{eqnarray*}
-\caption{A fragment of an SCFG with a ternary non-terminal expansion and three terminal rules.}
-\label{fig:scfg}
-\end{figure}
+Figure \ref{fig:intro_example_derivation} is an example derivation for Chinese to English translation using an SCFG of the form that I propose to learn using non-parametric Bayesian models.
The generative story is as follows.
In the beginning was the grammar, in which we allow two types of rules: {\emph non-terminal} and {\emph terminal} expansions.
@@ -159,4 +152,20 @@ Rewrite each frontier non-terminal, $c$, using a rule chosen from our grammar ex
Repeat until there are no remaining frontier non-terminals.
The sentences in both languages can then be read off the leaves, using the rules' alignments to find the right ordering.
+\begin{figure}[t]
+ \centering
+ \subfigure{\includegraphics[scale=0.7]{intro_slides/PhraseExtraction1.pdf}}
+ \subfigure{\includegraphics[scale=0.7]{intro_slides/HieroExtraction2.pdf}}
+\caption{Extracting translation rules from aligned sentences. All the phrases obtained using the standard phrase extraction heuristics are depicted in the left figure, these are: $\langle$ Je, I $\rangle$, $\langle$ veux, want to $\rangle$, $\langle$ travailler, work $\rangle$, $\langle$ ne veux pas, do not want to $\rangle$, $\langle$ ne veux pas travailler, do not want to work $\rangle$, $\langle$ Je ne veux pas, I do not want to $\rangle$, $\langle$ Je ne veux pas travailler, I do not want to work $\rangle$. On the right is shown how a discontiguous SCFG rule is created by generalising a phrase embedded in another phrase, the extracted rule is: X $\rightarrow$ $\langle$ ne X$_1$ pas, do not X$_1$ $\rangle$.}
+\label{fig:intro_rule_extraction}
+\end{figure}
+
+The process for extracting SCFG rules is based on that used to extract translation phrases in phrase based translation systems.
+The phrase based approach \cite{koehn03} uses heuristics to extract phrase translation pairs from a word-aligned corpus.
+The phrase extraction heuristic is illustrated in Figure \ref{fig:intro_rule_extraction}.
+This heuristic extracts all phrases whose words are either not aligned, or aligned with only other words in the same phrase.
+The phrase translation probabilities are then calculated using a maximum likelihood estimation.
+The Hiero \cite{chiang07hierarchical} SCFG extraction heuristic starts from a grammar consisting of the set of contiguous phrases, wherever a phrase is wholly embedded within another a new rule is add with the embedded phrase replace by the non-terminal X.
+This process continues until all possible rules have been extracted, subject to the constraints that every rule must contain a terminal on the source side, a rule may only contain two non-terminals on its right side and that those non-terminals may not be adjacent.
+The left example in Figure \ref{fig:intro_rule_extraction} depicts this rule generalisation process.