An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Combiner espaces sémantiques, structure et contraintes.

cover
This paper presents the methods that we developed for the tasks 1 and 4 of the DEFT'14 Text Mining contest. In the task 1 the goal was to automatically categorise the literary genre of short texts, while in the task 4 the goal was to assign the session where a scientific paper is presented in a conference by analysing its content. These methods we developed rely on a common representation of the input texts in semantic spaces constructed using Random Indexing. In these high dimension spaces, each text and each term is represented a vector. For this edition of the DEFT, we tried to address the proposed tasks by designing methods that combine classical machine learning algorithms for clustering and categorisation with (i) rule based methods to represent for instance the patterns of poetic texts in the task 1 (ii) constraints solving methods to take into account the informations we had about the organisation of the sessions in the task 4. The results obtained NDCG=0.4278 (rank 2) in the task 1 and FScore=1 (rank 1) in the task 4 show the great performance of these hybrid methods.
2014-09-10
ATALA
JRC90537
http://deft.limsi.fr/actes/2014/pdf/deft2014_02_lutin.pdf,    https://publications.jrc.ec.europa.eu/repository/handle/JRC90537,   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice