Title: Automatic Epidemiological Surveillance from On-line News in MedISys and PULS
Citation: Book of Abstract of the International Meeting on Emerging Diseases and Surveillance 2009 p. 62-63
Publisher: Pro-MED-Mail
Publication Year: 2009
JRC N°: JRC51837
URI: http://publications.jrc.ec.europa.eu/repository/handle/JRC51837
Type: Articles in periodicals and books
Abstract: Information extraction and content analysis is a mature research area with strong potential for real-world applications in the domain of Health Informatics. We focus on the task of tracking news about outbreaks of epidemics. Several systems use public news sources for tracking epidemics, some employing human analysts, some automatic, and some a combination of both. We describe a fully automatic, distributed system for tracking the spread of infectious disease, by extracting information from Web-based news sources. The system follows thousands of news sites in real time, extracts textual content from relevant Web pages, analyses the facts reported in the text, and accumulates the extracted facts in a database. The system provides functionality for aggregating and visualizing results, as well as alerting capability. The users we target are epidemic intelligence officers in Health Authorities. Surveillance on a global scale is essential for health threat analysis, due to the ease of spread of infectious agents across national borders. The methodology rests on the combination of Information Retrieval technology in MedISys, the news-tracking system developed at EC's Joint Research Centre, and linguistic analysis in PULS, the fact extraction system developed at the University of Helsinki. MedISys uses keyword-based search queries to identify mentions of infectious disease in potentially relevant contexts. It clusters related stories based on textual content, and collects statistics on hits over time, to spot unexpected spikes in the number of mentions of certain disease-location combinations. MedISys currently processes 50,000 news articles on average daily, in 40 languages. PULS receives potentially relevant articles from MedISys and analyses their text content using language technology, on average 10,000 per month. PULS uses syntactic and semantic patterns to distinguish mentions of disease in the context of outbreak reports vs. other, unrelated contexts. This enables PULS to identify the attributes of the outbreak cases: the disease, location, time, number of victims, severity, etc. In this way, MedISys and PULS complement each other's functionality to yield a whole that is greater than the sum of its parts. The systems are accessible to the public on-line. Our current estimates put precision of event detection at 75%, and the false negative rate at 14%. Current challenges include improving performance in terms of precision and recall, and expanding the number of languages that PULS handles.
JRC Directorate:Space, Security and Migration

Files in This Item:
There are no files associated with this item.

Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.