Title: A Validity Index for Prototype Based Clustering of Data Sets with Complex Cluster Structures
Authors: TASDEMIR KADIMMERENYI Erzsebet
Citation: IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS vol. 41 no. 4 p. 1039-1053
Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Year: 2011
JRC N°: JRC62772
ISSN: 1083-4419
URI: http://publications.jrc.ec.europa.eu/repository/handle/JRC62772
DOI: 10.1109/TSMCB.2010.2104319
Type: Articles in Journals
Abstract: Evaluation of how well the extracted clusters fit the true partitions of a data set is one of the fundamental challenges in unsupervised clustering because the data structure and the number of clusters are unknown a priori. Cluster validity indices are commonly used to select the best partitioning from different clustering results, however, they are often inadequate unless clusters are well separated or have parametrical shapes. Prototype based clustering (finding of clusters by grouping the prototypes obtained by vector quantization of the data), which is becoming increasingly important for its effectiveness in the analysis of large, high-dimensional data sets, adds another dimension to this challenge. For validity assessment of prototype based clusterings, previously proposed indexes ¿ mostly devised for the evaluation of point based clusterings ¿ usually perform poorly. The poor performance is made worse when the validity indexes are applied to large data sets with complicated cluster structure. In this work we propose a new index, Conn Index, which can be applied to data sets with a wide variety of clusters of different shapes, sizes, densities or overlaps. We construct Conn Index based on inter and intra-cluster connectivities of prototypes. Connectivities are defined through a ¿connectivity matrix¿, which is a weighted Delaunay graph where the weights indicate the local data distribution. Experiments on synthetic and real data indicate that Conn Index outperforms existing validity indices, used in this study, for the evaluation of prototype based clustering results.
JRC Institute:Institute for Environment and Sustainability

Files in This Item:
There are no files associated with this item.


Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.