Representation de Textes a l'Aide d'Etiquettes Semantiques dans le Cadre de la Classification Automatique
This paper describes an algorithm for document representation in a reduced vectorial space by a process of fea-ture extraction. The algorithm is evaluated in the context of the supervised classification of news articles.
We are generating a document representation (profile) represented by semantic tags from a machine-readable dictionary. We are dealing with synonymy handled by thematic conflation, and polysemy for which we have developed a statistical method for word-sense disambiguation.
We propose four variants for the profile generation depending on whether a recursive system is used or not, and whether a corrective factor for polysemous words is taken into account or not. We have evaluated 32 variants, depending on the algorithm type and on three other parameters: grammatical category selection, 15% reduction of the profile, and a stop-list of semantic tags. Some parameters (like profile reduction) have low influence on the classifier performance and others (corrective factor for the ambiguous words, stop-list) improve the perform-ance noticeably.
IGNAT Camelia;
ROUSSELOT François;
2008-05-30
Romanian Academy
JRC40906
0035-3957,
https://publications.jrc.ec.europa.eu/repository/handle/JRC40906,
Additional supporting files
| File name | Description | File type | |