Please use this identifier to cite or link to this item:
|Title:||Representation de Textes a l'Aide d'Etiquettes Semantiques dans le Cadre de la Classification Automatique|
|Authors:||IGNAT CAMELIA; ROUSSELOT François|
|Citation:||Revue Roumaine de Linguistique, RRL (Romanian Review of Linguistics) vol. 51 no. 3-4 p. 421-439|
|JRC Publication N°:||JRC40906|
|Type:||Articles in Journals|
|Abstract:||This paper describes an algorithm for document representation in a reduced vectorial space by a process of fea-ture extraction. The algorithm is evaluated in the context of the supervised classification of news articles. We are generating a document representation (profile) represented by semantic tags from a machine-readable dictionary. We are dealing with synonymy handled by thematic conflation, and polysemy for which we have developed a statistical method for word-sense disambiguation. We propose four variants for the profile generation depending on whether a recursive system is used or not, and whether a corrective factor for polysemous words is taken into account or not. We have evaluated 32 variants, depending on the algorithm type and on three other parameters: grammatical category selection, 15% reduction of the profile, and a stop-list of semantic tags. Some parameters (like profile reduction) have low influence on the classifier performance and others (corrective factor for the ambiguous words, stop-list) improve the perform-ance noticeably.|
|JRC Institute:||Institute for the Protection and Security of the Citizen|
Files in This Item:
There are no files associated with this item.
Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.