Bauer, P. R. (2017). Boosting classifications with imbalanced data [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2017.45341
E105 - Institut für Stochastik und Wirtschaftsmathematik
-
Date (published):
2017
-
Number of Pages:
90
-
Keywords:
Statistics; Classification; Boosting
en
Abstract:
Boosting is an ensemble method which uses a “weak” classifier to create a “strong” one, based on the theory of Robert Schapire’s work in 1990 (see Schapire 1990). It appears similar to bagging yet is fundamentally different. This thesis will start with a short introduction followed by a chapter describing the theory and methodology behind boosting. This is followed by a chapter presenting a set of boosting algorithms, applicable to binary, multi-class and regression problems. The major focus of this thesis is to examine the performance of boosting algorithms on imbalanced data sets. The issue with these data sets is that classifiers tend to emphasize the larger classes, which leads to significant class distribution skews. An established general solution to this issue is to apply sampling methods. After introducing these, the simulations chapter demonstrates that boosting algorithms work well with minority sampling in binary classification, whereas majority sampling appears to be preferable in the multi-class problem. However, it will be shown that in the multi-class setting the inbuilt re-weighting of hard to classify problems of the boosting algorithms AdaBoost.M1 and SAMME, is sufficient to handle imbalances in the data set, without any sampling necessary.
en
Additional information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers