Out-of-context Fine-grained Multi-word Entity Classification: exploring token, character n-gram and NN-based models for multilingual entity classification
In this paper, we present a number of experiments on the construction of fine-grained and out-of-context multi-word entity classification models. These models exploit a large BabelNet-derived multilingual Named Entity corpus of 49 languages from 7 different scripts, which is also presented in this work. In particular, we compare SVM-based character and token n-gram models with neural network-based ones and also explore language-specific variants against multilingual models. The various models have been evaluated on additional external Named Entity resources to gain further insight into the quality and re-usability of the trained models.
The language-independent character n-gram SVM-based model outperforms the corresponding token n-gram SVM-based model for a large majority of tested languages and obtained a 95.7% average precision. When applied to a number of external resources, we did see a slight drop in performance, but still achieved an average precision of 84.7% in all experiments, demonstrating the applicability of the proposed model in a range of contexts. Finally, the experiments applying a neural network model show comparable results for a language-specific and language-independent approaches.
JACQUET Guillaume;
PISKORSKI Jakub;
CHESNEY Sophie;
2019-07-25
SPECIAL INTEREST GROUP ON COMPUTER GRAPHICS, ASSOCIATION FOR COMPUTING MACHINERY
JRC109005
https://dl.acm.org/citation.cfm?doid=3297280.3297379,
https://publications.jrc.ec.europa.eu/repository/handle/JRC109005,
10.1145/3297280.3297379 (online),
Additional supporting files
| File name | Description | File type | |