Please use this identifier to cite or link to this item:
|Title:||Multi-word Entity Classification in a Highly Multilingual Environment|
|Authors:||CHESNEY SOPHIE; JACQUET GUILLAUME; STEINBERGER RALF; PISKORSKI JAKUB|
|Publisher:||The Association for Computational Linguistics (ACL)|
|Type:||Articles in periodicals and books|
|Abstract:||This paper describes an approach for the classification of millions of existing multiword entities (MWEntities), such as organisation or event names, into thirteen category types, based only on the tokens they contain. In order to classify our very large in-house collection of multilingual MWEntities into an applicationoriented set of entity categories, we trained distantly-supervised classifiers in 43 languages based on MWEntities extracted from BabelNet. The best-performing classifier was the multi-class SVM using a TF.IDF-weighted data presentation. Interestingly, one unique classifier trained on a mix of all languages consistently performed better than classifiers trained for individual languages, reaching an averaged F1-value of 88.8%. In this paper, we present the training and test data, including a human evaluation of its accuracy, describe the methods used to train the classifiers,and discuss the results.|
|JRC Directorate:||Joint Research Centre Corporate Activities|
Files in This Item:
There are no files associated with this item.
Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.