<div class="csl-bib-body">
<div class="csl-entry">Kircher, A. S. (2017). <i>Random forest for unbalanced multiple-class classification</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2017.44844</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2017.44844
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/6814
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
Random Forest is a cutting-edge method for unbalanced multiple-class classification. The main problem with unbalanced data is that the classifier tends to focus more on the bigger classes than on the smaller classes. To overcome this skewness, three sampling methods, namely oversampling, undersampling and a combination of both are introduced and compared based on the performance of the forest on a highly unbalanced data set with eleven classes. It seems that oversampling improves the performance of the forest dramatically, while undersampling often worsens it compared to the unbalanced classification. A combination of both seems, however, more adequate for this specific analysed data set since the effect of oversampling on the accuracy is much lower regarding the test data set than the dramatic improvements for the training data set. The danger of overfitting is lower if the data set is not only oversampled but retains its original total size while the observations are oversampled or undersampled to the same amount of observations. Analysing the data has shown that there are many noisy variables which legitimated raising the value of available variables (mtry) from the default to the median value between the default for classification mtry =sqrt(p) and the default value for regression mtry = 2p/3 .
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Random Forest
en
dc.subject
Multiple-class classification
en
dc.subject
Unbalanced data
en
dc.subject
Error rates
en
dc.title
Random forest for unbalanced multiple-class classification
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2017.44844
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Anna Sofia Kircher
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E105 - Institut für Stochastik und Wirtschaftsmathematik
-
dc.type.qualificationlevel
Diploma
-
dc.identifier.libraryid
AC13734521
-
dc.description.numberOfPages
127
-
dc.identifier.urn
urn:nbn:at:at-ubtuw:1-100518
-
dc.thesistype
Diplomarbeit
de
dc.thesistype
Diploma Thesis
en
dc.rights.identifier
In Copyright
en
dc.rights.identifier
Urheberrechtsschutz
de
tuw.advisor.staffStatus
staff
-
tuw.advisor.orcid
0000-0002-8014-4682
-
item.languageiso639-1
en
-
item.openairetype
master thesis
-
item.grantfulltext
open
-
item.fulltext
with Fulltext
-
item.cerifentitytype
Publications
-
item.mimetype
application/pdf
-
item.openairecristype
http://purl.org/coar/resource_type/c_bdcc
-
item.openaccessfulltext
Open Access
-
crisitem.author.dept
E370 - Institut für Energiesysteme und Elektrische Antriebe
-
crisitem.author.parentorg
E350 - Fakultät für Elektrotechnik und Informationstechnik