Multi-Property Multi-Label Documents Metadata Recommendation based on Encoder Embeddings
The task of document classification, particularly multi-label classification, presents a significant challenge due to the complexity of assigning multiple relevant labels to each document. This complexity is further amplified in multiproperty multi-label classification tasks, where documents must be categorized across various sets of labels. In this research, we introduce an innovative encoder embedding-driven approach to multi-property multi-label document classification that leverages semantic-text similarity and the reuse of pre-existing annotated data to enhance the efficiency and accuracy of the document annotation process. Our method requires only a single model for text similarity, eliminating the need for multiple property-specific classifiers and thereby reducing computational demands and simplifying deployment. We evaluate our approach through a prototype deployed at European Commission for daily operations, which demonstrates superior performance over existing classification systems. Our contributions include improved accuracy without additional training, increased efficiency, and demonstrated effectiveness in practical applications. The results of our study indicate the potential of our approach to be applied across various domains requiring multi-property multi-label document classification, offering a scalable and adaptable solution for metadata annotation tasks.
CHENIKI Nasredine;
DAUDARAVICIUS Vidas;
FELIACHI Abdelfettah;
HARDY Didier;
KUESTER Marc Wilhelm;
2024-11-13
Association for Computational Linguistics
JRC139298
979-8-89176-183-4 (online),
https://aclanthology.org/2024.nllp-1.19/,
https://publications.jrc.ec.europa.eu/repository/handle/JRC139298,
Additional supporting files
| File name | Description | File type | |