An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Representation de Textes a l'Aide d'Etiquettes Semantiques dans le Cadre de la Classification Automatique

cover
This paper describes an algorithm for document representation in a reduced vectorial space by a process of fea-ture extraction. The algorithm is evaluated in the context of the supervised classification of news articles. We are generating a document representation (profile) represented by semantic tags from a machine-readable dictionary. We are dealing with synonymy handled by thematic conflation, and polysemy for which we have developed a statistical method for word-sense disambiguation. We propose four variants for the profile generation depending on whether a recursive system is used or not, and whether a corrective factor for polysemous words is taken into account or not. We have evaluated 32 variants, depending on the algorithm type and on three other parameters: grammatical category selection, 15% reduction of the profile, and a stop-list of semantic tags. Some parameters (like profile reduction) have low influence on the classifier performance and others (corrective factor for the ambiguous words, stop-list) improve the perform-ance noticeably.
2008-05-30
Romanian Academy
JRC40906
0035-3957,   
https://publications.jrc.ec.europa.eu/repository/handle/JRC40906,   
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice