An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Lasso-based variable selection methods in text regression: the case of short texts

cover
Communication through websites is often characterised by short texts, made of few words, such as image captions or tweets. This paper explores the class of supervised learning methods for the analysis of short texts, as an alternative to unsupervised methods, widely employed to infer topics from structured texts. The aim is to assess the effectiveness of text data in social sciences, when they are used as explanatory variables in regression models. We compare the results obtained by several variants of lasso, screening-based methods and randomization-based models, such as sure independent screening and stability selection. Latent dirichlet allocation results are also considered as a term of comparison. Our perspective is primarily empirical and our starting point is the analysis of two real datasets, though bootstrap replications of each dataset are considered. The first case study aims at explaining price variations based on the information contained in the description of items on sale on e-commerce platforms. The second regards open questions in surveys on satisfaction ratings. The case studies are different in nature and representative of different kinds of short texts, as, in one case, a short descriptive and objective text is considered, whereas, in the other case, the short text is subjective and emotional.
2024-04-05
SPRINGER
JRC126124
1863-8171 (online),   
https://link.springer.com/content/pdf/10.1007/s10182-023-00472-0.pdf?pdf=button,    https://publications.jrc.ec.europa.eu/repository/handle/JRC126124,   
10.1007/s10182-023-00472-0 (online),   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice