Improving Sentiment Analysis in Twitter Using Multilingual Machine Translated Data
Sentiment analysis is currently a very dynamic field in Computational Linguistics. Research herein has concentrated on the development
of methods and resources for different types of texts and various languages. Nonetheless, the implementation of a multilingual system
that is able to classify sentiment expressed in various languages has not been approached so far. The main challenge this paper addresses is sentiment analysis from tweets in a multilingual setting. We first build a simple sentiment analysis system for tweets in English. Subsequently, we translate the data from English to four other languages - Italian, Spanish, French and German - using a standard machine translation system. Further on, we manually correct the test data and create Gold Standards for each of the target languages. Finally, we test the performance of the sentiment analysis classifiers for the different languages concerned
and show that the joint use of training data from multiple languages (especially those pertaining to the same family of languages) significantly improves the results of the sentiment classification.
BALAHUR DOBRESCU Alexandra;
TURCHI Marco;
2013-10-31
INCOMA Ltd.
JRC83532
1313-8502,
http://lml.bas.bg/ranlp2013/docs/RANLP_main.pdf,
https://publications.jrc.ec.europa.eu/repository/handle/JRC83532,
Additional supporting files
| File name | Description | File type | |