Title: Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts
Authors: BALAHUR DOBRESCU ALEXANDRATURCHI MarcoSTEINBERGER RalfPEREA ORTEGA JOSE MANUELJACQUET GUILLAUMEKUCUK DILEKZAVARELLA VanniEL GHALI ADIL
Citation: Proceedings of the 9th edition of the Language Resources and Evaluation Conference p. 4265-4269
Publisher: European Language Resources Association
Publication Year: 2014
JRC N°: JRC85267
ISBN: 978-2-9517408-8-4
URI: http://www.lrec-conf.org/proceedings/lrec2014/pdf/965_Paper.pdf
http://publications.jrc.ec.europa.eu/repository/handle/JRC85267
Type: Articles in periodicals and books
Abstract: Sentiment analysis (SA) regards the classification of texts according to the polarity of the opinions they express. SA systems are highly relevant to many real-world applications (e.g. marketing, eGovernance, business intelligence, behavioral sciences) and also to many tasks in Natural Language Processing (NLP) – information extraction, question answering, textual entailment, to name just a few. The importance of this field has been proven by the high number of approaches proposed in research, as well as by the interest that it raised from other disciplines and the applications that were created using its technology. In our case, the primary focus is to use sentiment analysis in the context of media monitoring, to enable tracking of global reactions to events. The main challenge that we face is that tweets are written in different languages and an unbiased system should be able to deal with all of them, in order to process all (possible) available data. Unfortunately, although many linguistic resources exist for processing texts written in English, for many other languages data and tools are scarce. Following our initial efforts described in (Balahur and Turchi, 2013), in this article we extend our study on the possibility to implement a multilingual system that is able to a) classify sentiment expressed in tweets in various languages using training data obtained through machine translation; b) to verify the extent to which the quality of the translations influences the sentiment classification performance, in this case, of highly informal texts; and c) to improve multilingual sentiment classification using small amounts of data annotated in the target language. To this aim, varying sizes of target language data are tested. The languages we explore are: Arabic, Turkish, Russian, Italian, Spanish, German and French.
JRC Directorate:Space, Security and Migration

Files in This Item:
There are no files associated with this item.


Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.