Combining various text analysis tools for multilingual media monitoring
There is ample evidence that information contained in media reports is complementary across countries and languages. This holds both for facts and for opinions. Monitoring multilingual and multinational media therefore gives a more complete picture of the world than monitoring the media of only one language, even if it is a world language like English. Wide coverage and highly multilingual text processing is thus important. The JRC-developed Europe Media Monitor (EMM) family of applications gathers about 100,000 media reports per day in 50 languages from the internet, groups related articles, classifies them, detects and follows trends, produces statistics and issues automatic alerts. For a subset of 20 languages, it also extracts and disambiguates entities (persons, organisations and locations) and reported speech, links related news over time and across languages, gathers historical information about entities and produces various types of social networks. More recent R&D efforts focus on event scenario template filling, opinion mining, multi-document summarisation, and machine translation. This extended abstract gives an overview of EMM from a functionality point of view rather than providing technical detail.
2012-01-30
Hamburger Zentrum für Sprachkorpora
JRC66331
0176-559X