diff options
Diffstat (limited to 'report')
-rw-r--r-- | report/introduction.tex | 36 |
1 files changed, 29 insertions, 7 deletions
diff --git a/report/introduction.tex b/report/introduction.tex index aa37fff3..adcd15b0 100644 --- a/report/introduction.tex +++ b/report/introduction.tex @@ -79,16 +79,45 @@ Previous research has focussed on structured learning approaches requiring costl In this workshop we adopted a pragmatic approach of embracing existing algorithms for inducing unlabelled SCFGs (e.g. the popular Hiero model \cite{chiang07hierarchical}), and then used state-of-the-art probabilistic models to independently learn syntactic classes for translation rules in the grammar. We structured the workshop into three parallel but interdependent streams: +\begin{figure} + \centering + \subfigure{\includegraphics[scale=0.5]{intro_slides/JeNeVeuxPasTravailler-hiero-labelled.pdf}} +\caption{Example derivation using the Hiero grammar extraction heuristics where non-terminals have been clustered into unsupervised syntactic categories denoted by $X?$.} +\label{fig:intro_labelled_hiero} +\end{figure} + + \paragraph{1) Unsupervised induction of labelled SCFGs} Inspired by work in monolingual PCFG learning, we have investigated generative models which describe the production of phrase translations in terms of sequences of tokens (or word classes) and their observed contexts. +We simplify the grammar induction problem to first clustering monolingual phrases based upon their distribution over contexts (preceding and following words), and then intersecting these labelled phrases with the Hiero \cite{chiang07hierarchical} SCFG rule extraction heuristics. +The end result are grammars that produce derivations like that in Figure \ref{fig:intro_labelled_hiero}, where the labels are the unsupervised clusters induced by our context based induction algorithm. + +We explored two approaches to the clustering of phrases. +The first was inspired by research in Topic Modelling, in particular Latent Dirichlet Allocation (LDA). +We formulated a model in which each phrase type maintains a distribution over labels, and each context a phrase appears in is generated by first choosing a label, and then generating the context from this labels distribution over contexts. +We extended the basic LDA model to incorporate hierarchical Pitman Yor distributions on the two generating components of the model. +In the second approach we optimised the same model as describe above, but in instead of using non-parametric Bayesian approach to encouraging sparsity in the model, we used the direct Posterior Regularisation technique of \cite{someone}. + +The labelled SCFGs produced by these algorithms generate a subset of the translations licensed by the original Hiero grammars. +While we hope that this restriction guides the model to more acceptable translations, with the inevitable noise present in all automatically induced translation models it is advantageous to allow our models to degrade gracefully to less restrictive grammars. +As such we also explored hierarchical cascades of grammars, each of which was induced with a different number of labels, allowing the translation model to switch between these while incurring a penalty for doing so. + +This work on inducing labels for SCFGs formed the core component of the workshop and the base for the following work exploiting these labellings. +By the conclusion of the workshop we were able to demonstrate that applying these induction techniques leads to improved translations for both Chinese$\rightarrow$English and Urdu$\rightarrow$ translation systems. Chapter \ref{chap:grammar_induction} describes this work. \paragraph{2) Decoding with labelled SCFGs} +The second major component of the workshop involved the investigation of improved decoding algorithms for the type of labelled SCFG produced by our induction methods. +Decoding complexity scales quadratically with the number of labels in the grammar. +As such inducing grammars with more than one label significantly increases the time and resources required for decoding. +We explored a number of avenues for reducing this computational burden, including the early pruning items from the search space before language model integration, and coarse-to-fine techniques in which decoding is performed with grammars with progressively more labels and pruning at each step. +We were able to show that each of these techniques could lead to faster decoding without compromising translations performance. Chapter \ref{chap:decoding} describes this work. \paragraph{3) Discriminative training labelled SCFG translation models} Chapter \ref{chap:training} describes this work. +The remainder of this introductory chapter provides a formal definition of SCFGs and describes the language pairs that we experimented with. \section{Synchronous context free grammar} \label{sec:scfg} @@ -127,10 +156,3 @@ Repeat until there are no remaining frontier non-terminals. The sentences in both languages can then be read off the leaves, using the rules' alignments to find the right ordering. -\begin{figure} - \centering - \subfigure{\includegraphics[scale=0.5]{intro_slides/JeNeVeuxPasTravailler-hiero-labelled.pdf}} -\caption{Example derivation using the Hiero grammar extraction heuristics where non-terminals have been clustered into unsupervised syntactic categories denoted by $X?$.} -\label{fig:intro_labelled_hiero} -\end{figure} - |