An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Strengths and Limitations of Word-Based Task Explainability in Vision Language Models: a Case Study on Biological Sex Biases in the Medical Domain

cover
Vision-language models (VLMs) can achieve high accuracy in medical applications but can retain demographic biases from training data. While multiple works have identified the presence of these biases in many VLMs, it remains unclear how strong their impact at the inference level is. In this work, we study how well a task-level explainability method based on linear combinations of words can detect multiple types of biases, with a focus on medical image classification. By manipulating the training datasets with demographic and non-demographic biases, we show how the adopted approach can detect explicitly encoded biases but fails with implicitly encoded ones, particularly biological sex. Our results suggest that such a failure likely stems from misalignment between sex-describing features in image versus text modalities. Our findings highlight limitations in the evaluated explainability method for detecting implicit biases in medical VLMs.
2025-09-01
Association for Computational Linguistics
JRC142218
979-8-89176-277-0 (online),   
https://aclanthology.org/2025.gebnlp-1.12/,    https://publications.jrc.ec.europa.eu/repository/handle/JRC142218,   
10.18653/v1/2025.gebnlp-1.12 (online),   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice