Title: Robust statistical methods for outlier detection with application to household expenditure data
Other Titles: Robuste Ausreissererkennung in Konsumdaten
Language: English
Authors: Gussenbauer, Johannes 
Qualification level: Diploma
Advisor: Templ, Matthias 
Assisting Advisor: Filzmoser, Peter 
Issue Date: 2015
Number of Pages: 76
Qualification level: Diploma
Outlier detection can be seen as a pre-processing step for locating data points in a data sample, which do not conform with the rest of the data. Various techniques and methods for outlier detection can be found in the literature dealing with different data types. In this master thesis the data sets used for outlier detection methods are household expenditure data from five countries. Based on classical estimates of the Gini coefficient these data sets are suspected to contain outlier. In order to detect data points that deviate from the rest of the data, one- and multi-dimensional outlier detection methods are applied on the household expenditure data. The outlier detection methods are based on robust estimates and incorporate, in some cases, the use of sample weights. Important issues concerning the data and outlier detection methods are the number of missing values in each data set as well as the position of true outliers, which is completely unknown. The main focus of this thesis lies in the understanding of the outlier detection methods and their in uence of the estimated Gini coefficient. Apart from applying the outlier detection methods on the various data sets and presenting the results, a recommendation on which of the outlier detection methods should be preferred when it comes to outlier detection on household expenditure data is presented in this work. In order to give a recommendation for outlier detection methods it is important to get a clearer vision of the performance of each outlier detection method on household expenditure data. To help understand the performance of the different outlier detection methods a simulation study, based on the original data from the survey, was conducted. The simulation study and all other calculations where executed using the R-programming language.
Keywords: Ausreissererkennung; Robustheit; Konsumdaten
Outlier Detection; Robustness; Expenditures Data
URI: https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-89066
Library ID: AC12670635
Organisation: E105 - Institut für Stochastik und Wirtschaftsmathematik 
Publication Type: Thesis
Appears in Collections:Thesis

Files in this item:

Page view(s)

checked on Jul 9, 2021


checked on Jul 9, 2021

Google ScholarTM


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.