Priselac, S. (2021). Outlier-robust logistic regression for imbalanced data [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.93263
E105 - Institut für Stochastik und Wirtschaftsmathematik
-
Date (published):
2021
-
Number of Pages:
52
-
Keywords:
Generalized linear model; Bianco-Yohai estimator
en
Abstract:
Logistic regression represents a widely used classification method for modeling a binary response variable. Many exemplary cases of binary logistic regression employ data sets with an imbalanced distribution of the output variable and often include outliers – atypical observations in the data. Both outliers and an imbalanced class distribution can greatly reduce the predictive power of the classifier. Therefore, such data structures require a robust method suitable for imbalanced learning problems.This thesis proposes a robust logistic regression for imbalanced data sets based on the Bianco-Yohai estimator, a highly robust method for logistic regression. The imbalance learning problem is addressed by including the cost-sensitive features in the objective function for parameter estimation. Thus, the implementation involves adapting the iterative algorithm for computing the Bianco-Yohai estimator. The paper also proposes an additional method for detecting leverage points required for the weighted version of the estimator, which significantly expands the data domain in which the Bianco-Yohai estimator is applicable.The obtained cost-sensitive forms of the Bianco-Yohai estimator, in the weighted and original versions, are compared with the corresponding non-robust and non-cost-sensitive forms. The results of the simulation experiments and the use case with the imbalanced data set employed for credit scoring indicate the following. For imbalanced data sets, the inclusion of cost significantly improves the performance of the Bianco-Yohai estimator in both the original and weighted versions. Moreover, the methods provide better performance compared to logistic regression when the data contain bad leverage points. Thus, the cost-sensitive form of the Bianco-Yohai estimator, in both its original and weighted versions, provides a statistically reliable classifier for modeling imbalanced data containing outliers.