Geiginger, L.-M., & Zseby, T. (2024). Evading Botnet Detection. In SAC ’24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing (pp. 1331–1340). https://doi.org/10.1145/3605098.3635921
Botnet detection remains a challenging task due to many botnet families with different communication strategies, traffic encryption, and hiding techniques. Machine learning-based methods have been successfully applied, but have proven to be vulnerable to evasion attacks. In this paper, we show how an attacker can evade the detection of botnet traffic by manipulating selected features in the attack flows. We first build two well-performing machine learning models - based on Random Forest and Support Vector Machine classifiers - trained using only features that are also available in encrypted traffic. We then show with two different datasets how the detection rate (recall) decreases significantly for both classifiers if only a few basic features are manipulated by the attacker. We apply two state-of-the-art evasion attacks: Hop Skip Jump and Fast Gradient Sign. For all manipulated attack vectors we perform a plausibility check to ensure consistency with traffic statistics and protocol rules, as well as a bot check to ensure the manipulated attack vectors are still valid bot samples. We show, that for both Hop Skip Jump and Fast Gradient Sign, it is possible to craft plausible network traffic samples, but for Fast Gradient Sign, the feature values of the manipulated samples are far outside the normal range for botnet traffic. Our results show that the models can easily be fooled if the attacker is able to test the black-box models multiple times. Since in a real setting attackers may not have access to the model and training data, we implement a local substitute model to generate the attack samples and then check if they are transferable to other machine learning models trained with different training data. Our results show that samples generated with Hop Skip Jump generally do not transfer well while Fast Gradient Sign samples also evade the detection of models other than the substitute model.