Title: Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library
Authors: RIANI MarcoCERIOLI AndreaPERROTTA DomenicoTORTI Francesca
Citation: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION vol. 9 no. 4 p. 461-481
Publisher: SPRINGER HEIDELBERG
Publication Year: 2015
JRC N°: JRC94530
ISSN: 1862-5347
URI: http://www.springer.com/-/2/AVFaxjQZAgfPWjhrte8d
http://publications.jrc.ec.europa.eu/repository/handle/JRC94530
DOI: 10.1007/s11634-015-0223-9
Type: Articles in periodicals and books
Abstract: We extend the capabilities of MixSim, a framework which is useful for evaluating the performance of clustering algorithms, on the basis of measures of agreement between data partitioning and flexible generation methods for data, outliers and noise. The peculiarity of the method is that data are simulated from normal mixture distributions on the basis of pre-specified synthesis statistics on an overlap measure, defined as sum of pairwise misclassification probabilities. We provide new tools which enable us to control additional overlapping statistics and departures from homogeneity and sphericity among groups. The output of this extension is a more flexible framework for generation of data to better address modern robust clustering scenarios. We also study the properties and the implications that this new way of simulating clustering data entails in terms of coverage of space, goodness of fit to theoretical distributions, and degree of convergence to nominal values. We demonstrate the new features using our MATLAB implementation that we have integrated in the FSDA toolbox for MATLAB. With MixSim, FSDA now integrates in the same environment state of the art robust clustering algorithms and principled routines for their evaluation and calibration. A spin off of our work is a general complex routine, translated pute the distribution function of a mixture of non central ¬2 random variables which is at the core of MixSim and has its own interest for many test statistics. from C language to MATLAB, to compute the distribution function of a mixture of non central ¬Chi^2 random variables which is at the core of MixSim and has its own interest for many test statistics.
JRC Directorate:Space, Security and Migration

Files in This Item:
There are no files associated with this item.


Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.