Title: Identification of credit default drivers via lasso estimation in the logistic regression model
Language: English
Authors: Blaskó, Péter 
Qualification level: Diploma
Advisor: Schneider, Ulrike 
Issue Date: 2019
Number of Pages: 65
Qualification level: Diploma
In this work, a binary logistic regression model for two-year default probabilities has been estimated on a data set containing information on 150.000 clients available on kaggle's competition "GiveMeSomeCredit". The optimal model has been selected by choosing a subset of continuous, categorical and ordinal variables reflecting sociodemographic and behavioral properties of the client as well as characteristics of their loans using the Lasso estimator. The issue of non-linear dependence of default probabilities on the regressors has been tackled by discretization of regressors using a version of the fused Lasso in a multivariate environment. We find that the model provides an excellent fit of the data by reaching an average out-of-sample AUC of over 86%, independent of the model selection criterion (AIC, BIC or CV). This value lies in the upper range of the industry standard and in range of more complicated modeling approaches such as in Wang et al. (2015). We see that the estimator gives the strongest weights to behavioral variables such as past due status and limit utilization, while sociodemographic variables and loan properties are much less significant.
Keywords: credit default; Lasso; logistic regression
URI: https://doi.org/10.34726/hss.2020.66160
DOI: 10.34726/hss.2020.66160
Library ID: AC15676004
Organisation: E105 - Institut für Stochastik und Wirtschaftsmathematik 
Publication Type: Thesis
Appears in Collections:Thesis

Files in this item:

Page view(s)

checked on Jul 25, 2021


checked on Jul 25, 2021

Google ScholarTM


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.