An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Graph and Embedding based Approach for Text Clustering: Topic Detection in a Large Multilingual Public Consultation

cover
We present a novel algorithm for multilingual text clustering built upon two well studied techniques: multilingual aligned embedding and community detection in graphs. The aim of our algorithm is to discover underlying topics in a multilingual dataset using clustering. We present both a numerical evaluation using silhouette and V-measure metrics, and a qualitative evaluation for which we propose a new systematic approach. Our algorithm presents robust overall performance and its results were empirically evaluated by an analyst. The work we present was done in the context of a large multilingual public consultation, for which our new algorithm was deployed and used on a daily basis.
2025-12-22
American Computer Association
JRC132590
https://dl.acm.org/doi/fullHtml/10.1145/3543873.3587627,    https://publications.jrc.ec.europa.eu/repository/handle/JRC132590,   
10.1145/3543873.3587627 (online),   
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice