We present a dataset of over 3,000 global disaster events from 2014 to 2024, derived from EM-DAT. These events are extracted from multilingual news articles using a pipeline that combines Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), enabling semantic extraction from unstructured text. The source corpus originates from the European Media Monitor (EMM), which aggregates content from thousands of verified news outlets across languages and regions. For each event, using LLMs and RAG, we automatically generate structured storylines summarizing hazard characteristics, drivers, impacts, and responses, which are then transformed into knowledge graphs to enable systematic analysis of relationships, inter-hazard dynamics, and human–environment interactions often missed in traditional disaster records. A subset of the graphs was validated through independent expert review, confirming the accuracy and relevance of the extracted information. The dataset supports retrospective disaster analysis and multi-hazard risk assessment, and complements existing resources such as the UNDRR Hazard Information Profiles (HIPs). All data, code, and processing workflows are openly available, accompanied by an interactive dashboard for exploration. This resource advances data-driven approaches to disaster scenario modeling, impact analysis, and decision support in disaster risk management.
RONCO Michele;
BANDELLI Luca;
BERTOLINI Lorenzo;
CONSOLI Sergio;
DELFORGE Damien;
SPADARO Alessio;
VERILE Marco;
CORBANE Christina;
2026-05-14
NATURE PORTFOLIO
JRC143554
2052-4463 (online),
https://www.nature.com/articles/s41597-026-07036-2,
https://publications.jrc.ec.europa.eu/repository/handle/JRC143554,
10.1038/s41597-026-07036-2 (online),
| Name | Country | City | Type |
|---|
This document is only visible at the Commission level.
You are not authorized to publish or distribute it outside the European Commission.
This is a public document. You can share this publication.
Datasets
| ID | Title | Public URL |
|---|
Dataset collections
| ID | Acronym | Title | Public URL |
|---|
Scripts / source codes
| Description | Public URL |
|---|
Additional supporting files
| File name | Description | File type |
|---|