Exploiting Linguistic and Statistical Knowledge in a Text Alignment System

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen:
https://osnadocs.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009022517
Open Access logo originally created by the Public Library of Science (PLoS)
Langanzeige der Metadaten
DC ElementWertSprache
dc.contributor.advisorProf. Dr. Peter Bosch
dc.contributor.advisorDr. habil. Helmar Gust
dc.contributor.advisorProf. Dr. Stefan Evert
dc.creatorSchrader, Bettina
dc.date.accessioned2010-01-30T14:56:39Z
dc.date.available2010-01-30T14:56:39Z
dc.date.issued2009-02-20T11:58:25Z
dc.date.submitted2009-02-20T11:58:25Z
dc.identifier.urihttps://osnadocs.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009022517-
dc.description.abstractIn machine translation, the alignment of corpora has evolved into a mature research area, aimed at providing training data for statistical or example-based machine translation systems. Moreover, the alignment information can be used for a variety of other purposes, including lexicography and the induction of tools for natural language processing. The alignment techniques used for these purposes fall roughly in two separate classes: sentence alignment approaches that often combine statistical and linguistic information, and word alignment models that are dominated by the statistical machine translation paradigm. Alignment approaches that use linguistic knowledge provided by corpus annotation are rare, as are as non-statistical word alignment strategies. Furthermore, parallel corpora are typically not aligned at all text levels simultaneously. Rather, a corpus is first sentence aligned, and in a subsequent step, the alignment information is refined to go below the sentence level. In this thesis, the distinction between the two alignment classes is withdrawn. Rather, a system is introduced that can simultaneously align at the paragraph, sentence, word, and phrase level. Furthermore, linguistic as well as statistical information can be combined. This combination of alignment cues from different knowledge sources, as well as the combination of the sentence and word alignment tasks, is made possible by the development of a modular alignment platform. Its main features are that it supports different kinds of linguistic corpus annotation, and furthermore aligns a corpus hierarchically, such that sentence and word alignments are cohesive. Alignment cues are not used within a global alignment model. Rather, different sub-models can be implemented and allowed to interact. Most of the alignment modules of the system have been implemented using empirical corpus studies, aimed at showing how the most common types of corpus annotation can be exploited for the alignment task.eng
dc.language.isoeng
dc.subjectComputerlinguistik
dc.subjectMaschinelle Übersetzung
dc.subjectKorpuslinguistik
dc.subjectWortalignment
dc.subjectSatzalignment
dc.subject.ddc000 - Informatik, Wissen, Systeme
dc.titleExploiting Linguistic and Statistical Knowledge in a Text Alignment Systemeng
dc.typeDissertation oder Habilitation [doctoralThesis]-
thesis.locationOsnabrück-
thesis.institutionUniversität-
thesis.typeDissertation [thesis.doctoral]-
thesis.date2007-10-09T12:00:00Z-
elib.elibid853-
elib.marc.edtfangmeier-
elib.dct.accessRightsa-
elib.dct.created2008-12-20T15:58:07Z-
elib.dct.modified2009-02-20T11:58:25Z-
dc.contributor.refereeProf. Dr. Peter Bosch
dc.contributor.refereeDr. habil. Helmar Gust
dc.contributor.refereeProf. Dr. Stefan Evert
dc.contributor.refereeProf. Dr. Martin Volk
dc.subject.dnb28 - Informatik, Datenverarbeitungger
dc.subject.ccsI.2.7 - Natural Language Processingeng
vCard.ORGFB8ger
Enthalten in den Sammlungen:FB08 - E-Dissertationen

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
E-Diss853_thesis.tar.gz1,53 MBGZIP
E-Diss853_thesis.tar.gz
Öffnen/Anzeigen
E-Diss853_thesis.pdfPräsentationsformat1,27 MBAdobe PDF
E-Diss853_thesis.pdf
Miniaturbild
Öffnen/Anzeigen


Alle Ressourcen im Repositorium osnaDocs sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt. rightsstatements.org