Similarity of Names across Scripts: Edit Distance Using Learned Costs of N-Grams
Any cross-language processing application has to first tackle the problem of transliteration when facing a language using another script. The first solution consists of using existing transliteration tools, but these tools are not usually suitable for all purposes. For some specific script pairs they do not even exist. Our aim is to discriminate transliterations across different scripts in a uni-fied way using a learning method that builds a transliteration model out of a set of transliterated proper names. We compare two strings using an algorithm that builds a Levenshtein edit distance using n-grams costs. The evaluations carried out show that our similarity measure is accurate.
POULIQUEN Bruno;
2008-09-16
Springer
JRC47413
978-3-540-85286-5,
0302-9743,
http://link.springer.com/chapter/10.1007%2F978-3-540-85287-2_39,
https://publications.jrc.ec.europa.eu/repository/handle/JRC47413,
10.1007/978-3-540-85287-2_39,
Additional supporting files
File name | Description | File type | |