An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Un Algorithme de Génération de Profil de Document Et Son Evaluation dans le Contexte de la Classification Thématique

cover
This paper describes an algorithm for document representation in a reduced vectorial space by a process of feature extraction. The algorithm is applied and evaluated in the context of the supervised classification of news articles from the collection of Le Monde newspaper issued in the years 2003 and 2004. We are generating a document representation (or profile), in a space of 800 dimensions, represented by semantic tags from a machine-readable dictionary. We are dealing with two issues: the synonymy handled by thematic conflation and polysemy for which we have developed a statistical method for word-sense disambiguation. We propose four variants for the profile generation (of a document) depending on whether a recursive system is used or not, and whether a corrective factor for polysemous words is taken into account or not. To determine the best classifier provided by our algorithm we have evaluated 32 variants, depending on the algorithm type (as previously) and on three other parameters that influence the document representation: grammatical category selection, 15% reduction of the profile, and a stop-list of semantic tags. The evaluation is done on a set of documents from six categories by calculating the precision, the recall and the F-measure to determine the best algorithm related to the threshold detection. Some parameters (like profile reduction) have low influence on the classifier performance and others (corrective factor for the ambiguous words, stop-list) improve it noticeably.
2006-11-27
JRC32638
https://publications.jrc.ec.europa.eu/repository/handle/JRC32638,   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice