Gussenbauer, J. (2015). Robust statistical methods for outlier detection with application to household expenditure data [Diploma Thesis]. reposiTUm. https://doi.org/10.34726/hss.2015.25895
Number of Pages:
Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform with the rest of the data. Various techniques and methods for outlier detection can be found in the literature dealing with different data types. In this master thesis the data sets used for outlier detection methods are household expenditure data from five countries. Based on classical estimates of the Gini coefficient these data sets are suspected to contain outlier. In order to detect data points that deviate from the rest of the data, one- and multi-dimensional outlier detection methods are applied on the household expenditure data. The outlier detection methods are based on robust estimates and incorporate, in some cases, the use of sample weights. Important issues concerning the data and outlier detection methods are the number of missing values in each data set as well as the position of true outliers, which is completely unknown. The main focus of this thesis lies in the understanding of the outlier detection methods and their in uence of the estimated Gini coefficient. Apart from applying the outlier detection methods on the various data sets and presenting the results, a recommendation on which of the outlier detection methods should be preferred when it comes to outlier detection on household expenditure data is presented in this work. In order to give a recommendation for outlier detection methods it is important to get a clearer vision of the performance of each outlier detection method on household expenditure data. To help understand the performance of the different outlier detection methods a simulation study, based on the original data from the survey, was conducted. The simulation study and all other calculations where executed using the R-programming language.