Fitting Mixtures of Regression Lines with the Forward Search
The forward search is a powerful method for detecting unidentified subsets and masked outliers and for determining their effect on models fitted to the data. This paper describes a semi-automatic approach to outlier detection and clustering through the forward search. Its main contribution is the development of a novel technique for the identification of clusters of points coming from different regression models. The method was motivated by fraud detection in foreign trade data as reported by the Member States of the European Union. We also address the challenging issue of selecting the number of groups. The performance of the algorithm is shown through an application to a specific bivariate trade data set. The applicability of the method on more complex and large data sets is commented in the paper.
RIANI Marco;
CERIOLI Andrea;
ATKINSON Anthony C.;
PERROTTA Domenico;
TORTI Francesca;
2008-11-06
IOS Press
JRC42676
978-1-58603-898-4,
1874-6268,
https://publications.jrc.ec.europa.eu/repository/handle/JRC42676,
10.3233/978-1-58603-898-4-271,
Additional supporting files
| File name | Description | File type | |