Katzengruber, M. (2020). Studying class membership scores in machine learning classification for imbalanced binary data [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2020.57167
Machine learning is getting increasing importance and is strongly promoted by the rise of computational power. A paramount application of machine learning is anomaly detection, sometimes understood as one-class classification,i.e., a binary classification problem in which there is a significant imbalance between the minority class (anomalies/outliers) and the majority class (normal/inlier). Real-life cases of such scenarios are, for example, fraud detection or attack detection in network communications. In this work, we study if the assumption is correct that wrongly classified instances are closer to decision boundaries and if this information can help to refine classification performances. We conducted experiments on network traffic and on other imbalanced datasets and found that, as a general rule, classification algorithms are able to leverage class membership scores to improve the “average precision” metric, which is suitable for evaluating imbalanced cases.Hence, class membership scores—defined based on distances to classification thresholds—help to improve classification while keeping the model explainability and the algorithm complexity simple.