Blaskó, P. (2019). Identification of credit default drivers via lasso estimation in the logistic regression model [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2020.66160
E105 - Institut für Stochastik und Wirtschaftsmathematik
-
Date (published):
2019
-
Number of Pages:
65
-
Keywords:
credit default; Lasso; logistic regression
en
Abstract:
In this work, a binary logistic regression model for two-year default probabilities has been estimated on a data set containing information on 150.000 clients available on kaggle's competition "GiveMeSomeCredit". The optimal model has been selected by choosing a subset of continuous, categorical and ordinal variables reflecting sociodemographic and behavioral properties of the client as well as characteristics of their loans using the Lasso estimator. The issue of non-linear dependence of default probabilities on the regressors has been tackled by discretization of regressors using a version of the fused Lasso in a multivariate environment. We find that the model provides an excellent fit of the data by reaching an average out-of-sample AUC of over 86%, independent of the model selection criterion (AIC, BIC or CV). This value lies in the upper range of the industry standard and in range of more complicated modeling approaches such as in Wang et al. (2015). We see that the estimator gives the strongest weights to behavioral variables such as past due status and limit utilization, while sociodemographic variables and loan properties are much less significant.