Title: Representation de Textes a l'Aide d'Etiquettes Semantiques dans le Cadre de la Classification Automatique
Authors: IGNAT CAMELIAROUSSELOT François
Citation: Revue Roumaine de Linguistique, RRL (Romanian Review of Linguistics) vol. 51 no. 3-4 p. 421-439
Publisher: Romanian Academy
Publication Year: 2006
JRC N°: JRC40906
ISSN: 0035-3957
URI: http://publications.jrc.ec.europa.eu/repository/handle/JRC40906
Type: Articles in Journals
Abstract: This paper describes an algorithm for document representation in a reduced vectorial space by a process of fea-ture extraction. The algorithm is evaluated in the context of the supervised classification of news articles. We are generating a document representation (profile) represented by semantic tags from a machine-readable dictionary. We are dealing with synonymy handled by thematic conflation, and polysemy for which we have developed a statistical method for word-sense disambiguation. We propose four variants for the profile generation depending on whether a recursive system is used or not, and whether a corrective factor for polysemous words is taken into account or not. We have evaluated 32 variants, depending on the algorithm type and on three other parameters: grammatical category selection, 15% reduction of the profile, and a stop-list of semantic tags. Some parameters (like profile reduction) have low influence on the classifier performance and others (corrective factor for the ambiguous words, stop-list) improve the perform-ance noticeably.
JRC Institute:Institute for the Protection and Security of the Citizen

Files in This Item:
There are no files associated with this item.


Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.