Creation and use of multilingual named entity variant dictionaries
The highly multilingual media analysis application Europe Media Monitor (EMM) makes extensive use of name dictionaries, including not only large lists of person, organisation and location names, but also many spelling variants for the same named entity, both within the same language and across languages and scripts. As EMM could not operate without these non-traditional dictionaries, we wish to make a strong case in their favour. In this chapter, we will explain how such vocabulary lists are used within EMM and how they were produced automatically by analysing over 100,000 news articles per day in over twenty languages. A large part of EMM’s vocabulary lists is made publicly available for download as part of JRC-Names.
STEINBERGER Ralf;
JACQUET Guillaume;
DELLA ROCCA Leonida;
2015-04-07
Editions Modulaires Européennes
JRC91623
978-2-8066-1144-4,
0771-6524,
Additional supporting files
File name | Description | File type | |