An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic

cover
We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity types person, organization, location, date and number, as well as quotations (reported speech) by and about people. The NER system was not developed for Arabic, but - instead - a highly multilingual, almost language-independent NER system was adapted to also cover Arabic. Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This paper thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the otherwise language-independent rule set in order to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.
2010-06-07
European Language Resources Agency (ELRA)
JRC57133
http://www.lrec-conf.org/proceedings/lrec2010/index.html,    http://www.lrec-conf.org/proceedings/lrec2010/pdf/669_Paper.pdf,    https://publications.jrc.ec.europa.eu/repository/handle/JRC57133,   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice