Please use this identifier to cite or link to this item:
|Title:||Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic|
|Authors:||ZAGHOUANI Wajdi; POULIQUEN Bruno; EBRAHIM MOHAMED; STEINBERGER Ralf|
|Citation:||Proceedings of The seventh international conference on Language Resources and Evaluation (LREC) - ISBN 2-9517408-6-7 p. 563-567|
|Publisher:||European Language Resources Agency (ELRA)|
|JRC Publication N°:||JRC57133|
|Type:||Contributions to Conferences|
|Abstract:||We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity types person, organization, location, date and number, as well as quotations (reported speech) by and about people. The NER system was not developed for Arabic, but - instead - a highly multilingual, almost language-independent NER system was adapted to also cover Arabic. Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This paper thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the otherwise language-independent rule set in order to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.|
|JRC Institute:||Institute for the Protection and Security of the Citizen|
Files in This Item:
There are no files associated with this item.
Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.