Title: The Selection of Electronic Text Documents Supported by Only Positive Examples
Authors: ZIZKA JanHROZA JiříPOULIQUEN BRUNOIGNAT CAMELIASTEINBERGER RALF
Citation: Proceedings of the 8th International Conference on the Statistical Analysis of Textual Data (JADT'2006) p. 993-1002
Publication Year: 2006
JRC N°: JRC32546
URI: http://publications.jrc.ec.europa.eu/repository/handle/JRC32546
Type: Contributions to Conferences
Abstract: The European Commission has a freely accessible news monitoring system called the Europe Media Monitor NewsBrief (http://press.jrc.it/), which is available for all twenty official languages of the Eu¬ropean Union, plus some more languages. Among other things, NewsBrief categorizes articles through routing procedures and it alerts users interested in a large variety of different subject domains automatically. In the effort to im¬prove the multilingual categorization and relevance ranking functionality for some complex interest profiles, for which only positive examples are currently available, we implemented a modified k-NN (k-nearest neighbors) algorithm and empirically detected parameters and parameter settings that produce good results for rather different subject areas (news on the EU-Constitution, on Iraq, and on Terrorism). Experiments on this real-life data yielded very satisfying results: a precision of over 90% for a recall of up to 70%. These results were then compared to others achieved with one-class SVM and with SVM that was trained on both positive and artificially generated negative example sets. Efforts are currently underway to incorporate this new functionality within NewsBrief and to make it available to the users.
JRC Institute:Institute for the Protection and Security of the Citizen

Files in This Item:
There are no files associated with this item.


Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.