Boosting classifications with imbalanced data

Bauer, Philipp Rudolf

doi:10.34726/hss.2017.45341

Record link:

https://doi.org/10.34726/hss.2017.45341
http://hdl.handle.net/20.500.12708/5289

Title:

Boosting classifications with imbalanced data

Citation:

Bauer, P. R. (2017). Boosting classifications with imbalanced data [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2017.45341

reposiTUm DOI:

10.34726/hss.2017.45341

CatalogPlus:

AC14500523

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Bauer, Philipp Rudolf

Advisor:

Filzmoser, Peter

Organisational Unit:

E105 - Institut für Stochastik und Wirtschaftsmathematik

Date (published):

2017

Number of Pages:

Keywords:

Statistics; Classification; Boosting

Abstract:

Boosting is an ensemble method which uses a “weak” classifier to create a “strong” one, based on the theory of Robert Schapire’s work in 1990 (see Schapire 1990). It appears similar to bagging yet is fundamentally different. This thesis will start with a short introduction followed by a chapter describing the theory and methodology behind boosting. This is followed by a chapter presenting a set of boosting algorithms, applicable to binary, multi-class and regression problems. The major focus of this thesis is to examine the performance of boosting algorithms on imbalanced data sets. The issue with these data sets is that classifiers tend to emphasize the larger classes, which leads to significant class distribution skews. An established general solution to this issue is to apply sampling methods. After introducing these, the simulations chapter demonstrates that boosting algorithms work well with minority sampling in binary classification, whereas majority sampling appears to be preferable in the multi-class problem. However, it will be shown that in the multi-class setting the inbuilt re-weighting of hard to classify problems of the boosting algorithms AdaBoost.M1 and SAMME, is sufficient to handle imbalances in the data set, without any sampling necessary.

Additional information:

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis