summaryrefslogtreecommitdiff
path: root/report/SCFGs.tex
diff options
context:
space:
mode:
Diffstat (limited to 'report/SCFGs.tex')
-rw-r--r--report/SCFGs.tex51
1 files changed, 40 insertions, 11 deletions
diff --git a/report/SCFGs.tex b/report/SCFGs.tex
index 0002405d..3441c7db 100644
--- a/report/SCFGs.tex
+++ b/report/SCFGs.tex
@@ -1,22 +1,51 @@
\chapter{Synchronous context free grammars} \label{sec:scfg}
-%\subsubsection*{Synchronous context free grammar} \label{sec:scfg}
-\begin{figure}[t]
+
+The translation models used in this workshop are synchronous context free grammars (SCFGs).
+SCFGs \cite{lewis68scfg} generalizes context-free grammars to generate strings concurrently in two (or more) languages. A string pair is generated by applying a series of paired rewrite rules of the form, $X \rightarrow \langle \mathbf{e}, \mathbf{f}, \mathbf{a} \rangle$, where $X$ is a non-terminal, $\mathbf{e}$ and $\mathbf{f}$ are strings of terminals and non-terminals and $\mathbf{a}$ specifies a one-to-one alignment between non-terminals in $\mathbf{e}$ and $\mathbf{f}$.
+In the statistical machine translation, the two righthand sides of SCFG rules represent the source and target languages. The process of translation occurs by parsing the source sentence, which induces a parallel tree structure and translation in the target language \cite{chiang07hierarchical}.
+Terminal are rewritten as pairs of strings of terminal symbols in the source and target languages. Additionally, one side of a terminal expansion may be the special symbol $\epsilon$, which indicates a null alignment which permits arbitrary insertions and deletions.
+Figure \ref{fig:toy-scfg} gives an example SCFG between Urdu and English. Figure \ref{fig:toy-scfg-parse} shows how the SCFG is used to derive the translation of an input Urdu sentence.
+
+
+
+
+\begin{figure}
\begin{center}
-\includegraphics[width=0.6\columnwidth]{example_derivation2.pdf}
+\includegraphics[width=.6\linewidth]{SCFGs/example-scfg}
\end{center}
-\caption[Derivation]{An example SCFG derivation from a Chinese source sentence which yields the English sentence: {\em ``Brown arrived in Shanghai from Beijing late last night.''}. The non-terminal alignment $\mathbf{a}$ is specified by the variable subscripts.}
-\label{fig:intro_example_derivation}
+\caption{A toy example that illustrates a SCFG that can translate (romanized) Urdu into English for one sentence. }\label{fig:toy-scfg}
\end{figure}
-The translation models discussed explored in this workshop are based on synchronous grammars.
-Here we provide a short definition of the formalism we've employed: synchronous context free grammar (SCFG).
-A synchronous context free grammar (SCFG, \cite{lewis68scfg}) generalizes context-free grammars to generate strings concurrently in two (or more) languages. A string pair is generated by applying a series of paired rewrite rules of the form, $X \rightarrow \langle \mathbf{e}, \mathbf{f}, \mathbf{a} \rangle$, where $X$ is a non-terminal, $\mathbf{e}$ and $\mathbf{f}$ are strings of terminals and non-terminals and $\mathbf{a}$ specifies a one-to-one alignment between non-terminals in $\mathbf{e}$ and $\mathbf{f}$.
-In the context of SMT, by assigning the source and target languages to the respective sides of a probabilistic SCFG it is possible to describe translation as the process of parsing the source sentence, which induces a parallel tree structure and translation in the target language \cite{chiang07hierarchical}.
-Terminal are rewritten as pairs of strings of terminal symbols in the source and target languages. Additionally, one side of a terminal expansion may be the special symbol $\epsilon$, which indicates a null alignment which permits arbitrary insertions and deletions.
-Figure \ref{fig:intro_example_derivation} is an example derivation for Chinese to English translation using an SCFG of the form that I propose to learn using non-parametric Bayesian models.
+
+\begin{figure}
+\begin{tabular}{lll}
+\multicolumn{3}{>{\columncolor[rgb]{0.95,0.95,0.75}}c}{The input is an Urdu sentence which is initially unanalyzed.}\\
+\includegraphics[width=.45\linewidth]{SCFGs/urdu-input} & &
+\\ \hline
+\multicolumn{3}{>{\columncolor[rgb]{0.95,0.95,0.75}}c}{Here all of the terminal symbols receive non-terminal labels. The English words are in Urdu order.}\\
+\includegraphics[width=.45\linewidth]{SCFGs/urdu-step0} & &
+\includegraphics[width=.45\linewidth]{SCFGs/english-step0} \\ \hline
+\multicolumn{3}{>{\columncolor[rgb]{0.95,0.95,0.75}}c}{The PP rule reorders the Urdu postpositional phrase to be a prepositional phrase on the English side.}\\\includegraphics[width=.45\linewidth]{SCFGs/urdu-step1} & &
+\includegraphics[width=.45\linewidth]{SCFGs/english-step1} \\ \hline\multicolumn{3}{>{\columncolor[rgb]{0.95,0.95,0.75}}c}{The English auxiliary verb and main verb get reordered with the application of the VP rule.}\\
+\includegraphics[width=.45\linewidth]{SCFGs/urdu-step2} & &
+\includegraphics[width=.45\linewidth]{SCFGs/english-step2} \\ \hline
+\multicolumn{3}{>{\columncolor[rgb]{0.95,0.95,0.75}}c}{This VP rule moves the English verb from the Urdu verb-final position to its correct place before the PP.}\\
+\includegraphics[width=.45\linewidth]{SCFGs/urdu-step3} & & \includegraphics[width=.45\linewidth]{SCFGs/english-step3} \\ \hline
+\multicolumn{3}{>{\columncolor[rgb]{0.95,0.95,0.75}}c}{Applying the S rule, means that we have a complete translation of the Urdu sentence.}\\
+\includegraphics[width=.45\linewidth]{SCFGs/urdu-step4} & & \includegraphics[width=.45\linewidth]{SCFGs/english-step4}
+\end{tabular}
+\caption{Using SCFGs as the underlying formalism means that the process of translation is one of parsing. This shows how an English sentence can be generated by parsing the Urdu sentence using the rules given in Figure \ref{fig:toy-scfg}}\label{fig:toy-scfg-parse}
+\end{figure}
+
+
+
+
+
+
+% of the form that I propose to learn using non-parametric Bayesian models.
The generative story is as follows.
In the beginning was the grammar, in which we allow two types of rules: {\emph non-terminal} and {\emph terminal} expansions.