Machine learning in credit default risk

Petrov, Alexander

doi:10.34726/hss.2022.95042

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2022.95042
http://hdl.handle.net/20.500.12708/19819

Titel:

Zitat:

Petrov, A. (2022). Machine learning in credit default risk [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.95042

reposiTUm-DOI:

10.34726/hss.2022.95042

CatalogPlus:

AC16486821

Publikationstyp:

Hochschulschrift - Diplomarbeit

Sprache:

Englisch

Autor_innen:

Petrov, Alexander

Betreuer_in:

Filzmoser, Peter

Organisationseinheit:

E105 - Institut für Stochastik und Wirtschaftsmathematik

Datum (veröffentlicht):

2022

Umfang:

Keywords:

Neural networks; Random forests; Logistic regression

Abstract:

While the amount of data collected by banks increases exponentially, the introduction of sophisticated machine learning models becomes inevitable in order to keep up with the times. The European Banking Authority (EBA) published a discussion paper in Novem- ber 2021 which might open new possibilities for the estimation of the risk parameters by the internal rating-based (IRB) approach.This thesis aims to compare the performance of different machine learning algorithms in the field of credit risk and, more specifically, in the discrimination of good and bad customers as a part of the probability of default (PD) estimation. The data consists of the corporate customers of a European bank and their balance sheet positions enriched by the region and industry information with the 12 months default flag as the target variable.The binary classification algorithms are described from the theoretical point of view and then applied using R packages. Thereby, the data pre-processing pipeline including an extensive missing data treatment as well as an outlier detection method plays a decisive role because of a significant noise level in the sample, while simultaneously addressing the problem of imbalanced data through undersampling and overweighting. A cross-validation procedure ensures that an adequate out-of-time generalization is achieved.The results state that some of the advanced machine learning techniques outperform the ordinary logistic regression and its regularized modifications while the others such as support vector machine deliver a comparable performance. A plain neural network with one hidden layer provides the best predictions in terms of gini on the holdout sample using a uniform quantile transformation. Random forest achieves the best performance with the untransformed data, notwithstanding that the interpretation of the results and implementation of the model in production environment are less straightforward than in case of logistic regression.

Weitere Information:

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis