An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Automatic Epidemiological Surveillance from On-line News in MedISys and PULS

cover
Information extraction and content analysis is a mature research area with strong potential for real-world applications in the domain of Health Informatics. We focus on the task of tracking news about outbreaks of epidemics. Several systems use public news sources for tracking epidemics, some employing human analysts, some automatic, and some a combination of both. We describe a fully automatic, distributed system for tracking the spread of infectious disease, by extracting information from Web-based news sources. The system follows thousands of news sites in real time, extracts textual content from relevant Web pages, analyses the facts reported in the text, and accumulates the extracted facts in a database. The system provides functionality for aggregating and visualizing results, as well as alerting capability. The users we target are epidemic intelligence officers in Health Authorities. Surveillance on a global scale is essential for health threat analysis, due to the ease of spread of infectious agents across national borders. The methodology rests on the combination of Information Retrieval technology in MedISys, the news-tracking system developed at EC's Joint Research Centre, and linguistic analysis in PULS, the fact extraction system developed at the University of Helsinki. MedISys uses keyword-based search queries to identify mentions of infectious disease in potentially relevant contexts. It clusters related stories based on textual content, and collects statistics on hits over time, to spot unexpected spikes in the number of mentions of certain disease-location combinations. MedISys currently processes 50,000 news articles on average daily, in 40 languages. PULS receives potentially relevant articles from MedISys and analyses their text content using language technology, on average 10,000 per month. PULS uses syntactic and semantic patterns to distinguish mentions of disease in the context of outbreak reports vs. other, unrelated contexts. This enables PULS to identify the attributes of the outbreak cases: the disease, location, time, number of victims, severity, etc. In this way, MedISys and PULS complement each other's functionality to yield a whole that is greater than the sum of its parts. The systems are accessible to the public on-line. Our current estimates put precision of event detection at 75%, and the false negative rate at 14%. Current challenges include improving performance in terms of precision and recall, and expanding the number of languages that PULS handles.
2009-06-04
Pro-MED-Mail
JRC51837
https://publications.jrc.ec.europa.eu/repository/handle/JRC51837,   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice