Title: Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources
Authors: TURCHI MARCOEHRMANN MAUD
Citation: "POLIBITS" An open access research journal on Computer Science and Computer Engineering with Applications no. 43 p. 37-43
Publisher: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico
Publication Year: 2011
JRC N°: JRC66253
ISSN: 1870-9044
URI: http://polibits.gelbukh.com/2011_43/Polibits_43_2011.pdf
http://publications.jrc.ec.europa.eu/repository/handle/JRC66253
Type: Articles in Journals
Abstract: Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.
JRC Institute:Institute for the Protection and Security of the Citizen

Files in This Item:
There are no files associated with this item.


Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.