Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources
A system that recognises cross-lingual plagiarism needs to establish – among other things – whether two pieces of text written in different languages are equivalent to each other. Potthast et al. (2010) give a thorough overview of this challenging task. While the Joint Research Centre (JRC) is not specifically concerned with plagiarism, it has been working for many years on developing other cross-lingual functionalities that may well be useful for the plagiarism detection task, i.e. (a) cross-lingual document similarity calculation, (b) subject domain profiling of documents in many different languages according to the same multilingual subject domain categorisation scheme, and (c) the recognition of name spelling variants for the same entity, both within the same language and across different languages and scripts. The speaker will explain the algorithms behind these software tools and he will present a number of freely available language resources that can be used to develop software with cross-lingual functionality.
STEINBERGER Ralf;
2013-08-12
CLEF
JRC73867
http://ims-sites.dei.unipd.it/documents/71612/155385/CLEF2012wn-PAN-Steinberger2012.pdf,
https://publications.jrc.ec.europa.eu/repository/handle/JRC73867,
Additional supporting files
| File name | Description | File type | |