Please use this identifier to cite or link to this item:
http://publications.jrc.ec.europa.eu/repository/handle/JRC83677
Title: | Acronym recognition and processing in 22 languages |
Authors: | EHRMANN Maud; DELLA ROCCA Leonida; STEINBERGER Ralf; TANEV Hristo |
Citation: | Proceedings of Recent Advances in Natural Language Processing p. 237-244 |
Publisher: | Incoma Ltd. |
Publication Year: | 2013 |
JRC N°: | JRC83677 |
ISSN: | 1313-8502 |
URI: | http://lml.bas.bg/ranlp2013/docs/RANLP_main.pdf (proceedings) http://publications.jrc.ec.europa.eu/repository/handle/JRC83677 |
Type: | Articles in periodicals and books |
Abstract: | We are presenting work on recognising acronyms of the form Long-Form (Short-Form) such as “International Monetary Fund (IMF)” in millions of news articles in twenty-two languages, as part of our more general effort to recognise entities and their variants in news text and to use them for the automatic analysis of the news, including the linking of related news across languages. We show how the acronym recognition patterns, initially developed for medical terms, needed to be adapted to the more general news domain and we present evaluation results. We describe our effort to automatically merge the numerous long-form variants referring to the same short-form, while keeping non-related long-forms separate. Finally, we provide extensive statistics on the frequency and the distribution of short-form/long-form pairs across languages. |
JRC Directorate: | Space, Security and Migration |
Files in This Item:
There are no files associated with this item.
Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.