Named Entity Recognition on Turkish Tweets
Various recent studies show that the performance of named entity recognition (NER) systems developed for well-formed text types drops significantly when applied to tweets. The only existing study for the highly inflected agglutinative language Turkish reports a drop in F-Measure
from 91% to 19% when ported from news articles to tweets. In this study, we present a new named entity-annotated tweet corpus and a detailed analysis of the various tweet-specific linguistic phenomena. We perform comparative NER experiments with a rule-based multilingual NER system adapted to Turkish on three corpora: a news corpus, our new tweet corpus, and another tweet corpus. Based on the analysis and the experimentation results, we suggest system features required to improve NER results for social media like Twitter.
KUCUK Dilek;
JACQUET Guillaume;
STEINBERGER Ralf;
2014-09-24
Association for Computational Linguistics (ACL)
JRC84941
978-2-9517408-8-4,
http://www.lrec-conf.org/proceedings/lrec2014/pdf/380_Paper.pdf,
https://publications.jrc.ec.europa.eu/repository/handle/JRC84941,
Additional supporting files
| File name | Description | File type | |