Please use this identifier to cite or link to this item:
|Title:||Building a Multilingual Named Entity-Annotated Corpus Using Annotation Projection|
|Authors:||EHRMANN MAUD; TURCHI MARCO; STEINBERGER Ralf|
|Citation:||Proceedings of Recent Advances in Natural Language Processing p. 118-124|
|Type:||Articles in periodicals and books|
|Abstract:||As developers of a highly multilingual named entity recognition (NER) system, we face an evaluation resource bottleneck problem: we need evaluation data in many languages, the annotation should not be too time-consuming, and the evaluation results across languages should be comparable. We solve the problem by automatically annotating the English version of a multi-parallel corpus and by projecting the annotations into all the other language versions. For the translation of English entities, we use a phrase-based statistical machine translation system as well as a lookup of known names from a multilingual name database. For the projection, we incrementally apply different methods: perfect string matching, perfect consonant signature matching and edit distance similarity. The resulting annotated parallel corpus will be made available for reuse.|
|JRC Directorate:||Space, Security and Migration|
Files in This Item:
There are no files associated with this item.
Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.