Studying class membership scores in machine learning classification for imbalanced binary data

Katzengruber, Matthias

doi:10.34726/hss.2020.57167

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2020.57167
http://hdl.handle.net/20.500.12708/15638

Titel:

Studying class membership scores in machine learning classification for imbalanced binary data

Zitat:

Katzengruber, M. (2020). Studying class membership scores in machine learning classification for imbalanced binary data [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2020.57167

reposiTUm-DOI:

10.34726/hss.2020.57167

CatalogPlus:

AC15754408

Publikationstyp:

Hochschulschrift - Diplomarbeit

Sprache:

Englisch

Autor_innen:

Katzengruber, Matthias

Betreuer_in:

Zseby, Tanja

Mitbetreuer_innen:

Iglesias Vazquez, Felix

Organisationseinheit:

E389 - Telecommunications

Datum (veröffentlicht):

2020

Umfang:

Keywords:

anomaly detection; machine learning; classification; network traffic analysis

Abstract:

Machine learning is getting increasing importance and is strongly promoted by the rise of computational power. A paramount application of machine learning is anomaly detection, sometimes understood as one-class classification,i.e., a binary classification problem in which there is a significant imbalance between the minority class (anomalies/outliers) and the majority class (normal/inlier). Real-life cases of such scenarios are, for example, fraud detection or attack detection in network communications. In this work, we study if the assumption is correct that wrongly classified instances are closer to decision boundaries and if this information can help to refine classification performances. We conducted experiments on network traffic and on other imbalanced datasets and found that, as a general rule, classification algorithms are able to leverage class membership scores to improve the “average precision” metric, which is suitable for evaluating imbalanced cases.Hence, class membership scores—defined based on distances to classification thresholds—help to improve classification while keeping the model explainability and the algorithm complexity simple.

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis