An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Similarity of Names across Scripts: Edit Distance Using Learned Costs of N-Grams

cover
Any cross-language processing application has to first tackle the problem of transliteration when facing a language using another script. The first solution consists of using existing transliteration tools, but these tools are not usually suitable for all purposes. For some specific script pairs they do not even exist. Our aim is to discriminate transliterations across different scripts in a uni-fied way using a learning method that builds a transliteration model out of a set of transliterated proper names. We compare two strings using an algorithm that builds a Levenshtein edit distance using n-grams costs. The evaluations carried out show that our similarity measure is accurate.
2008-09-16
Springer
JRC47413
978-3-540-85286-5,   
0302-9743,   
http://link.springer.com/chapter/10.1007%2F978-3-540-85287-2_39,    https://publications.jrc.ec.europa.eu/repository/handle/JRC47413,   
10.1007/978-3-540-85287-2_39,   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice