Please use this identifier to cite or link to this item:
|Title:||Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts|
|Authors:||BALAHUR DOBRESCU ALEXANDRA; TURCHI Marco; STEINBERGER Ralf; PEREA ORTEGA JOSE MANUEL; JACQUET GUILLAUME; KUCUK DILEK; ZAVARELLA Vanni; EL GHALI ADIL|
|Citation:||Proceedings of the 9th edition of the Language Resources and Evaluation Conference p. 4265-4269|
|Publisher:||European Language Resources Association|
|Type:||Articles in periodicals and books|
|Abstract:||Sentiment analysis (SA) regards the classification of texts according to the polarity of the opinions they express. SA systems are highly relevant to many real-world applications (e.g. marketing, eGovernance, business intelligence, behavioral sciences) and also to many tasks in Natural Language Processing (NLP) – information extraction, question answering, textual entailment, to name just a few. The importance of this field has been proven by the high number of approaches proposed in research, as well as by the interest that it raised from other disciplines and the applications that were created using its technology. In our case, the primary focus is to use sentiment analysis in the context of media monitoring, to enable tracking of global reactions to events. The main challenge that we face is that tweets are written in different languages and an unbiased system should be able to deal with all of them, in order to process all (possible) available data. Unfortunately, although many linguistic resources exist for processing texts written in English, for many other languages data and tools are scarce. Following our initial efforts described in (Balahur and Turchi, 2013), in this article we extend our study on the possibility to implement a multilingual system that is able to a) classify sentiment expressed in tweets in various languages using training data obtained through machine translation; b) to verify the extent to which the quality of the translations influences the sentiment classification performance, in this case, of highly informal texts; and c) to improve multilingual sentiment classification using small amounts of data annotated in the target language. To this aim, varying sizes of target language data are tested. The languages we explore are: Arabic, Turkish, Russian, Italian, Spanish, German and French.|
|JRC Directorate:||Space, Security and Migration|
Files in This Item:
There are no files associated with this item.
Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.