Exploiting Linguistic and Statistical Knowledge in a Text Alignment System

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: https://repositorium.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009022517
Titel: Exploiting Linguistic and Statistical Knowledge in a Text Alignment System
Autor(en): Schrader, Bettina
Erstgutachter: Prof. Dr. Peter Bosch
Dr. habil. Helmar Gust
Prof. Dr. Stefan Evert
Zweitgutachter: Prof. Dr. Peter Bosch
Dr. habil. Helmar Gust
Prof. Dr. Stefan Evert
Prof. Dr. Martin Volk
Zusammenfassung: In machine translation, the alignment of corpora has evolved into a mature research area, aimed at providing training data for statistical or example-based machine translation systems. Moreover, the alignment information can be used for a variety of other purposes, including lexicography and the induction of tools for natural language processing. The alignment techniques used for these purposes fall roughly in two separate classes: sentence alignment approaches that often combine statistical and linguistic information, and word alignment models that are dominated by the statistical machine translation paradigm. Alignment approaches that use linguistic knowledge provided by corpus annotation are rare, as are as non-statistical word alignment strategies. Furthermore, parallel corpora are typically not aligned at all text levels simultaneously. Rather, a corpus is first sentence aligned, and in a subsequent step, the alignment information is refined to go below the sentence level. In this thesis, the distinction between the two alignment classes is withdrawn. Rather, a system is introduced that can simultaneously align at the paragraph, sentence, word, and phrase level. Furthermore, linguistic as well as statistical information can be combined. This combination of alignment cues from different knowledge sources, as well as the combination of the sentence and word alignment tasks, is made possible by the development of a modular alignment platform. Its main features are that it supports different kinds of linguistic corpus annotation, and furthermore aligns a corpus hierarchically, such that sentence and word alignments are cohesive. Alignment cues are not used within a global alignment model. Rather, different sub-models can be implemented and allowed to interact. Most of the alignment modules of the system have been implemented using empirical corpus studies, aimed at showing how the most common types of corpus annotation can be exploited for the alignment task.
URL: https://repositorium.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009022517
Schlagworte: Computerlinguistik; Maschinelle Übersetzung; Korpuslinguistik; Wortalignment; Satzalignment
Erscheinungsdatum: 20-Feb-2009
Enthalten in den Sammlungen:FB08 - E-Dissertationen

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
E-Diss853_thesis.tar.gz1,53 MBGZIPÖffnen/Anzeigen
E-Diss853_thesis.pdfPräsentationsformat1,27 MBAdobe PDFMiniaturbild

Alle Ressourcen im repOSitorium sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt. rightsstatements.org