Collection of External Scientific Studies on General-Purpose AI Models under the EU AI Act
This report proposes a scientific methodology to identify high-impact capabilities in General-Purpose AI (GPAI) models, defined in the EU AI Act as capabilities of the most advanced GPAI models. High-impact capabilities play an important role in the EU AI Act since GPAI models with high-impact capabilities are classified as GPAI models with systemic risks. The approach is based on observational scaling laws using Principal Components Analysis (PCA) from a set of existing benchmarks, allowing for the extraction of a low-dimensional capability measure that can be used to identify models with high-impact capabilities. The proposed method involves selecting a diverse set of benchmarks that measure general capabilities, such as MMLU-Pro, GPQA-diamond, MATH-level-5, and HumanEval, and aggregating their scores using a weighted threshold-based metric. The weights are determined by the PCA approach, and the threshold is based on a reference model, to be set by the enforcement authority based on legal, policy, and risks considerations. The report also discusses additional considerations, including the need for a multi-disciplinary expert group to oversee benchmark selection, the importance of updating the approach every 6 months to account for rapid developments in AI, and mitigation measures to prevent companies from strategically underperforming on benchmarks. By providing a practical and robust way to assess high-impact capabilities, this methodology aims to contribute to the development of a more comprehensive approach to evaluating GPAI models.
HOBBHAHN Marius;
HOVY Dirk;
VANSCHOREN Joaquin;
FERNANDEZ LLORCA David;
ERIKSSON Maria;
GOMEZ Emilia;
2025-10-10
Publications Office of the European Union
JRC143258
978-92-68-31572-9 (online),
OP KJ-01-25-469-EN-N (online),
https://publications.jrc.ec.europa.eu/repository/handle/JRC143258,
10.2760/8206407 (online),