Similarity of Names across Scripts: Edit Distance Using Learned Costs of N-Grams

POULIQUEN, Bruno

doi:10.1007/978-3-540-85287-2_39

An official website of the European Union How do you know?

Similarity of Names across Scripts: Edit Distance Using Learned Costs of N-Grams

Any cross-language processing application has to first tackle the problem of transliteration when facing a language using another script. The first solution consists of using existing transliteration tools, but these tools are not usually suitable for all purposes. For some specific script pairs they do not even exist. Our aim is to discriminate transliterations across different scripts in a uni-fied way using a learning method that builds a transliteration model out of a set of transliterated proper names. We compare two strings using an algorithm that builds a Levenshtein edit distance using n-grams costs. The evaluations carried out show that our similarity measure is accurate.

POULIQUEN Bruno;

2008-09-16

Springer

JRC47413

978-3-540-85286-5,

0302-9743,

http://link.springer.com/chapter/10.1007%2F978-3-540-85287-2_39, https://publications.jrc.ec.europa.eu/repository/handle/JRC47413,

10.1007/978-3-540-85287-2_39,

Name	Country	City	Type

Datasets

ID	Title	Public URL

Dataset collections

ID	Acronym	Title	Public URL

Scripts / source codes

Description	Public URL

Additional supporting files

File name	Description	File type

Show metadata record Copy citation url to clipboard Download BibTeX

Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice