Text Categorization Using Bibliographic Records - Beyond Document Content
This paper studies the use of different sources of information for performing
a text classifcation task. The growing number of digital libraries imposes
a review of the available data from those databases. Some experiments applying
different base classifers for a multi-label classifer in the domain of High Energy
Physics on several of these possible sources have been carried out. Results show
that the use of metadata is almost as good as the full-text version of papers.
Keywords: text categorization, machine learning, digital libraries.
MONTEJO-RAEZ Arturo;
URENA-LOPEZ L. Alfonso;
STEINBERGER Ralf;
2006-11-24
Sociedad Espanola para el Procesiamento del Lenguaje Natural
JRC31101
https://publications.jrc.ec.europa.eu/repository/handle/JRC31101,
Additional supporting files
| File name | Description | File type | |