An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Automatic Construction of Multilingual Name Dictionaries

cover
This chapter is a contribution to the forthcoming book 'Learning Machine Translation', MIT Press, to be published in 2008. ABSTRACT: Machine Translation and other Natural Language Processing systems often experience performance loss if they have to process texts with unknown words, such as proper names. Proper name dictionaries are rare and can never be complete because new names are being made up all the time. A solution to overcome this performance loss could be to recognise and mark named entities in text before translating it and to carry over the named entity untranslated. This would also help avoid the accidental translation of a name such as 'Bill Black', e.g. into French as ¿Facture Noire¿. An even better translation would be achieved if the target language spelling of the name would be used, and this seems even crucial when translating from languages with a different script, such as Chinese, Arabic or Cyrillic. We will show that multilingual name dictionaries are furthermore helpful for a number of other text analysis applications, including information retrieval, topic detection and tracking, relation and event extraction, and more. We then present a method and a system to recognise named entities of the types ¿person¿ and ¿ to some extent ¿ ¿organisation¿ in multilingual text collections and to automatically identify which of the newly identified names are variants of a known name. By doing this for currently nineteen languages and in the course of years, a multilingual name dictionary has been built up that contains to date over 630,000 names plus over 135,000 known variants, with up to 170 multilingual variants for a single name. The automatically generated name dictionary is used daily, for various purposes, in the publicly accessible multilingual news aggregation and analysis system NewsExplorer.
2009-02-02
MIT Press
JRC41746
978-0-262-07297-7,   
http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=11753,    https://publications.jrc.ec.europa.eu/repository/handle/JRC41746,   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice