Title: Boosting classifications with imbalanced data
Other Titles: Boosting Classifications with Imbalanced Data
Language: English
Authors: Bauer, Philipp Rudolf 
Qualification level: Diploma
Advisor: Filzmoser, Peter 
Issue Date: 2017
Number of Pages: 90
Qualification level: Diploma
Abstract: 
Boosting is an ensemble method which uses a “weak” classifier to create a “strong” one, based on the theory of Robert Schapire’s work in 1990 (see Schapire 1990). It appears similar to bagging yet is fundamentally different. This thesis will start with a short introduction followed by a chapter describing the theory and methodology behind boosting. This is followed by a chapter presenting a set of boosting algorithms, applicable to binary, multi-class and regression problems. The major focus of this thesis is to examine the performance of boosting algorithms on imbalanced data sets. The issue with these data sets is that classifiers tend to emphasize the larger classes, which leads to significant class distribution skews. An established general solution to this issue is to apply sampling methods. After introducing these, the simulations chapter demonstrates that boosting algorithms work well with minority sampling in binary classification, whereas majority sampling appears to be preferable in the multi-class problem. However, it will be shown that in the multi-class setting the inbuilt re-weighting of hard to classify problems of the boosting algorithms AdaBoost.M1 and SAMME, is sufficient to handle imbalances in the data set, without any sampling necessary.
Keywords: Statistics; Classification; Boosting
URI: https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-104275
http://hdl.handle.net/20.500.12708/5289
Library ID: AC14500523
Organisation: E105 - Institut für Stochastik und Wirtschaftsmathematik 
Publication Type: Thesis
Hochschulschrift
Appears in Collections:Thesis

Files in this item:


Page view(s)

11
checked on Jul 1, 2021

Download(s)

53
checked on Jul 1, 2021

Google ScholarTM

Check


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.