header.html

An official website of the European Union How do you know?      
European Commission logo

handle.jsp

cover
We present a dataset of over 3,000 global disaster events from 2014 to 2024, derived from EM-DAT. These events are extracted from multilingual news articles using a pipeline that combines Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), enabling semantic extraction from unstructured text. The source corpus originates from the European Media Monitor (EMM), which aggregates content from thousands of verified news outlets across languages and regions. For each event, using LLMs and RAG, we automatically generate structured storylines summarizing hazard characteristics, drivers, impacts, and responses, which are then transformed into knowledge graphs to enable systematic analysis of relationships, inter-hazard dynamics, and human–environment interactions often missed in traditional disaster records. A subset of the graphs was validated through independent expert review, confirming the accuracy and relevance of the extracted information. The dataset supports retrospective disaster analysis and multi-hazard risk assessment, and complements existing resources such as the UNDRR Hazard Information Profiles (HIPs). All data, code, and processing workflows are openly available, accompanied by an interactive dashboard for exploration. This resource advances data-driven approaches to disaster scenario modeling, impact analysis, and decision support in disaster risk management.
2026-05-14
NATURE PORTFOLIO
JRC143554
2052-4463 (online),   
https://www.nature.com/articles/s41597-026-07036-2,    https://publications.jrc.ec.europa.eu/repository/handle/JRC143554,   
10.1038/s41597-026-07036-2 (online),   
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 

footer.html