Towards Person Name Matching for Inflective Languages
Web person search is one of the most common activities of Internet users. Recently, a vast amount of work on applying various NLP techniques for person name disambiguation in large web document collections has been reported, where the main focus was on English and few other major languages. This paper reports on knowledge-poor methods for tackling person name matching task in Polish, a highly inflected language with complex person name declension paradigm. These methods apply mainly well-established string distance metrics, some new variants thereof, automatically acquired simple suffix-based lemmatization patterns and some combinations of the aforementioned techniques. Results of numerous experiments are presented.
PISKORSKI Jakub;
WIELOCH Karol;
PIKULA Mariusz;
SYDOW Marcin;
2008-07-08
ACM
JRC45829
Additional supporting files
File name | Description | File type | |