An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Out-of-context Fine-grained Multi-word Entity Classification: exploring token, character n-gram and NN-based models for multilingual entity classification

cover
In this paper, we present a number of experiments on the construction of fine-grained and out-of-context multi-word entity classification models. These models exploit a large BabelNet-derived multilingual Named Entity corpus of 49 languages from 7 different scripts, which is also presented in this work. In particular, we compare SVM-based character and token n-gram models with neural network-based ones and also explore language-specific variants against multilingual models. The various models have been evaluated on additional external Named Entity resources to gain further insight into the quality and re-usability of the trained models. The language-independent character n-gram SVM-based model outperforms the corresponding token n-gram SVM-based model for a large majority of tested languages and obtained a 95.7% average precision. When applied to a number of external resources, we did see a slight drop in performance, but still achieved an average precision of 84.7% in all experiments, demonstrating the applicability of the proposed model in a range of contexts. Finally, the experiments applying a neural network model show comparable results for a language-specific and language-independent approaches.
2019-07-25
SPECIAL INTEREST GROUP ON COMPUTER GRAPHICS, ASSOCIATION FOR COMPUTING MACHINERY
JRC109005
https://dl.acm.org/citation.cfm?doid=3297280.3297379,    https://publications.jrc.ec.europa.eu/repository/handle/JRC109005,   
10.1145/3297280.3297379 (online),   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice