Improving Sentiment Analysis in Twitter Using Multilingual Machine Translated Data

BALAHUR, DOBRESCU Alexandra; TURCHI, Marco

Sentiment analysis is currently a very dynamic field in Computational Linguistics. Research herein has concentrated on the development of methods and resources for different types of texts and various languages. Nonetheless, the implementation of a multilingual system that is able to classify sentiment expressed in various languages has not been approached so far. The main challenge this paper addresses is sentiment analysis from tweets in a multilingual setting. We first build a simple sentiment analysis system for tweets in English. Subsequently, we translate the data from English to four other languages - Italian, Spanish, French and German - using a standard machine translation system. Further on, we manually correct the test data and create Gold Standards for each of the target languages. Finally, we test the performance of the sentiment analysis classifiers for the different languages concerned and show that the joint use of training data from multiple languages (especially those pertaining to the same family of languages) significantly improves the results of the sentiment classification.

BALAHUR DOBRESCU Alexandra; TURCHI Marco;

2013-10-31

INCOMA Ltd.

JRC83532

1313-8502,

http://lml.bas.bg/ranlp2013/docs/RANLP_main.pdf, https://publications.jrc.ec.europa.eu/repository/handle/JRC83532,

Name	Country	City	Type

Datasets

ID	Title	Public URL

Dataset collections

ID	Acronym	Title	Public URL

Scripts / source codes

Description	Public URL

Additional supporting files

File name	Description	File type

Show metadata record Copy citation url to clipboard Download BibTeX