A Validity Index for Prototype Based Clustering of Data Sets with Complex Cluster Structures
Evaluation of how well the extracted clusters fit the true partitions of a data set is one of the fundamental challenges in unsupervised clustering because the data structure and the number of clusters are unknown a priori. Cluster validity indices are commonly used to select the best partitioning from different clustering results, however, they are often inadequate unless clusters are well separated or have
parametrical shapes. Prototype based clustering (finding of clusters by grouping the prototypes obtained by vector quantization of the data), which is becoming increasingly important for its effectiveness in the analysis of large, high-dimensional data sets, adds another dimension to this challenge. For validity assessment of prototype based clusterings, previously proposed indexes ¿ mostly devised for the evaluation of point based clusterings ¿ usually perform poorly. The poor performance is made worse when the validity indexes are applied to large data sets with complicated cluster structure. In this work we propose a new index, Conn Index, which can be applied to data sets with a wide variety of clusters of different shapes, sizes, densities or overlaps. We construct Conn Index based on inter and intra-cluster connectivities of prototypes. Connectivities are defined through a ¿connectivity matrix¿, which is a weighted Delaunay graph where the weights indicate the local data distribution. Experiments on synthetic and real data indicate that Conn Index outperforms existing validity indices, used in this study, for the evaluation of prototype based clustering results.
TASDEMIR Kadim;
MERENYI Erzsebet;
2011-07-28
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
JRC62772
1083-4419,
https://publications.jrc.ec.europa.eu/repository/handle/JRC62772,
10.1109/TSMCB.2010.2104319,
Additional supporting files
File name | Description | File type | |