Robust statistical methods for outlier detection with application to household expenditure data

Gussenbauer, Johannes

doi:10.34726/hss.2015.25895

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2015.25895
http://hdl.handle.net/20.500.12708/14511

Titel:

Robust statistical methods for outlier detection with application to household expenditure data

Zitat:

Gussenbauer, J. (2015). Robust statistical methods for outlier detection with application to household expenditure data [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2015.25895

reposiTUm-DOI:

10.34726/hss.2015.25895

CatalogPlus:

AC12670635

Publikationstyp:

Hochschulschrift - Diplomarbeit

Sprache:

Englisch

Autor_innen:

Gussenbauer, Johannes

Betreuer_in:

Templ, Matthias

Mitbetreuer_innen:

Filzmoser, Peter

Organisationseinheit:

E105 - Institut für Stochastik und Wirtschaftsmathematik

Datum (veröffentlicht):

2015

Umfang:

Keywords:

Ausreissererkennung; Robustheit; Konsumdaten

Outlier Detection; Robustness; Expenditures Data

Abstract:

Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform with the rest of the data. Various techniques and methods for outlier detection can be found in the literature dealing with different data types. In this master thesis the data sets used for outlier detection methods are household expenditure data from five countries. Based on classical estimates of the Gini coefficient these data sets are suspected to contain outlier. In order to detect data points that deviate from the rest of the data, one- and multi-dimensional outlier detection methods are applied on the household expenditure data. The outlier detection methods are based on robust estimates and incorporate, in some cases, the use of sample weights. Important issues concerning the data and outlier detection methods are the number of missing values in each data set as well as the position of true outliers, which is completely unknown. The main focus of this thesis lies in the understanding of the outlier detection methods and their in uence of the estimated Gini coefficient. Apart from applying the outlier detection methods on the various data sets and presenting the results, a recommendation on which of the outlier detection methods should be preferred when it comes to outlier detection on household expenditure data is presented in this work. In order to give a recommendation for outlier detection methods it is important to get a clearer vision of the performance of each outlier detection method on household expenditure data. To help understand the performance of the different outlier detection methods a simulation study, based on the original data from the survey, was conducted. The simulation study and all other calculations where executed using the R-programming language.

Weitere Information:

Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis