Detecting Extreme Numerical Outliers in Trade Data: A Novel Method for Highly Asymmetric Distributions
This paper introduces a novel statistical method for detecting extreme numerical outliers in highly asymmetric trade data distributions, a common challenge in the analysis of international trade data. The European Commission's Directorate-General for Taxation and Customs Union (DG TAXUD) relies on the Surveillance database to collect import/export transactions from national authorities. However, the presence of large errors in declared values due to data quality issues can significantly impact policy making, anti-fraud measures, and the reliability of EU-wide statistics. Traditional robust statistical methods, such as the standard boxplot, often fail to accurately identify outliers in skewed distributions, leading to a high risk of false positives.
To address this, the paper builds upon the adjusted boxplot method for skewed distributions proposed by Hubert and Vandervieren (2008), extending it to accommodate the pronounced skewness characteristic of trade data. The proposed approach involves developing thresholds for detecting extreme anomalous numbers in each distribution, focusing on the right tail where the most significant outliers are expected. The method's effectiveness is assessed using real international trade data provided by DG TAXUD, with the aim of enhancing data quality checks, EU-wide statistics, and policy decisions.
The empirical analysis uses robust regression techniques to estimate a model that determines thresholds for flagging potential extreme outliers in the net mass of trade products. The study includes data from over 1.5 billion records across 7,447 products, with a focus on ensuring flexibility, statistical robustness, computational efficiency, and software simplicity. The results demonstrate the method's potential to provide a manageable set of irrefutable errors to customs offices, thereby improving the accuracy of trade data monitoring and analysis.
CERASA Andrea;
2024-10-10
UNECE
JRC138858
https://unece.org/sites/default/files/2024-08/SDE2024_S4_EC_Cerasa_D.pdf,
https://publications.jrc.ec.europa.eu/repository/handle/JRC138858,
Additional supporting files
| File name | Description | File type | |