Linguistic Knowledge for Statistical Machine Translation
Kursbeschreibung
Studiengang |
Modulkürzel |
Leistungs- bewertung
|
BA-2010 |
AS-CL |
8 LP |
NBA |
AS-CL |
8 LP |
Master |
SS-CL, SS-TAC |
8 LP |
Magister |
- |
- |
Dozenten/-innen |
Alexander Fraser |
Veranstaltungsart |
Hauptseminar |
Erster Termin |
22.04.2014 |
Zeit und Ort |
Di, 11:15–12:45,
INF 325 / SR 23 (SR)
|
Commitment-Frist |
16.06.–13.07.2014 |
Teilnahmevoraussetzungen
"Statistical Machine Translation"
Leistungsnachweis
- regelmäßige und aktive Teilnahme
- Referat
- Hausarbeit oder Projekt
Inhalt
Phrase-based statistical machine translation (PBSMT) is the state-of-the-art for machine
translation of some language pairs. PBSMT is surprisingly free of explicit linguistic
knowledge, but can be very effective. However, this is not always true. For instance,
when translating into a morphologically rich language the translation quality is lacking,
particularly when there is also significant syntactic divergence between the two languages.
The quality of PBSMT is poor in this case because of independence assumptions made
involving morphology and syntax in the translation model that do not reflect linguistic
reality.
In this course we will read papers that try to address this problem by adding linguistic
knowledge to the translation process in a wide variety of ways. We will start with
an intensive focus on morphology. We will then move on to syntax, semantic roles and
beyond. Participants will be encouraged to look at actual translation system output
for problems and we will connect these observations with the work that we discuss.
To take part in this course, please fill out this questionnaire.
Kursübersicht
Seminarplan
Date |
Material |
Referent |
2014-04-22 |
Introduction to Course and Research Area |
Fraser |
2014-04-29 |
Empirical Methods for Compound Splitting, Philipp Koehn and Kevin Knight, EACL 2003
Improving Statistical MT Through Morphological Analysis. Sharon Goldwater and David McClosky. EMNLP 2005 |
Kiem |
2014-05-06 |
Enriching morphologically poor languages for statistical machine translation.
Eleftherios Avramidis, Philipp Koehn.
ACL 2008
Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish.
R Yeniterzi, K Oflazer.
ACL 2010
|
reading group |
2014-05-13 |
Arabic preprocessing schemes for statistical machine translation.
Nizar Habash, Fatiha Sadat.
NAACL 2006
Unsupervised morphology rivals supervised morphology for Arabic MT.
D Stallard, J Devlin, M Kayser, YK Lee.
ACL 2012
|
reading group |
2014-05-20 |
Dependency Treelet Translation: Syntactically Informed Phrasal SMT.
Chris Quirk, Arul Menezes, Colin Cherry.
ACL 2005
A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT.
Andreas Zollmann, Ashish Venugopal, Franz Och and Jay Ponte.
COLING 2008
|
Bylinovich |
2014-05-27 |
Applying morphology generation models to machine translation.
Kristina Toutanova, Hisami Suzuki, and Achim Ruopp.
ACL 2008
Combining morpheme-based machine translation with post-processing morpheme prediction.
Ann Clifton, Anoop Sarkar.
ACL 2011
|
Mayer reading group |
2014-06-03 |
Optimizing Chinese Word Segmentation for Machine Translation Performance.
Pi-Chuan Chang, Michel Galley, and Christopher D. Manning.
ACL WMT 2008
Unsupervised Tokenization for Machine Translation.
Tagyoung Chung and Daniel Gildea.
EMNLP 2009
|
Placzek |
2014-06-10 |
Chinese Syntactic Reordering for Statistical Machine Translation.
C Wang, M Collins, P Koehn.
EMNLP-CoNLL 2007
Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation.
Dmitriy Genzel.
COLING-2010
|
Li |
2014-06-17 |
What's in a translation rule?
Michel Galley, Mark Hopkins, Kevin Knight, Daniel Marcu.
NAACL 2004
A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model.
Libin Shen, Jinxi Xu, Ralph Weischedel.
ACL 2008
|
Schneider |
2014-07-01 |
A Hierarchical Phrase-Based Model for Statistical Machine Translation.
David Chiang.
ACL 2005
Tree-to-String Alignment Template for Statistical Machine Translation.
Yang Liu, Qun Liu, Shouxun Lin.
ACL 2006
|
Claus |
2014-07-08 |
Unsupervised Multilingual Learning for Morphological Segmentation.
Benjamin Snyder, Regina Barzilay.
ACL 2008
Unsupervised bilingual morpheme segmentation and alignment with context-rich hidden semi-Markov models.
J Naradowsky, K Toutanova.
ACL 2011
|
Nakryyko reading group |
2014-07-15 |
Semantic roles for SMT: a hybrid two-pass model.
Dekai Wu, Pascale Fung.
NAACL 2009
Semantic role features for machine translation.
Ding Liu, Daniel Gildea.
COLING 2010
|
Haider |
2014-07-22 |
Bilingual Sentiment Consistency for Statistical Machine Translation.
Chen and Zhu.
EACL 2014
Applying the semantics of negation to SMT through n-best list re-ranking.
Fancellu and Webber.
EACL 2014
|
Haas |
Literatur
-
Philipp Koehn's textbook
Statistical Machine Translation.