LSTM autoencoders for botnet detection

De Bettignies, Jan

doi:10.34726/hss.2021.87961

Record link:

https://doi.org/10.34726/hss.2021.87961
http://hdl.handle.net/20.500.12708/19206

Title:

Citation:

De Bettignies, J. (2021). LSTM autoencoders for botnet detection [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.87961

reposiTUm DOI:

10.34726/hss.2021.87961

CatalogPlus:

AC16410555

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

De Bettignies, Jan

Advisor:

Zseby, Tanja

Co-advisor:

Hartl, Alexander

Organisational Unit:

E389 - Institute of Telecommunications

Date (published):

2021

Number of Pages:

Keywords:

attack detection; Machine Learning; network traffic

Abstract:

Securing the Internet against malicious attacks is an ever evolving task. In this thesis, we aim at the detection of botnet traffic using two different machine learning methods: a random forest and LSTM autoencoder.For our experiments, we use the public ISOT data set created by the Information Security and Object Technology (ISOT) research lab of the University of Victoria in Canada. It contains benign and botnet traffic and has been used in related work. We select a random forest (RF) algorithm for its simplicity, ease of use, good classification per- formance and explainability. We compare it to a long short term memory (LSTM) autoencoder which is a variant of a neural network autoencoder. It is a more recent and complex algorithm that aims to extract not readily apparent information from the examined data, however the explainability is difficult.We choose to compare a supervised approach (RF) with an unsupervised LSTM autoencoder and apply them to our data set using features as similar as possible in both machine learning models, with regards to the different characteristics of the two. The features can’t be exactly the same because of the fundamental difference of a flow-based (random forest) vs. a packet- based (LSTM autoencoder) algorithm. A flow-based algorithm aggregates all packets in a set of statistical parameters which implies that a flow has to be terminated before calculating the characteristics. In contrast, a packet-based algorithm considers each individual packet and can, if necessary, try to compute a connection between those packets, even when a flow is still active. In the field of botnet detection, this can be of advantage as even an ongoing attack can be detected. Both algorithms are subject to a hyperparameter tuning. We also analyze which features influence detection for both the random forest and LSTM autoencoder algorithms.The RF easily outclasses the LSTM autoencoder, achieving a ROC-AUC of 0.99 compared to 0.64 for the LSTM autoencoder. This shows that the LSTM autoencoder may not be the best choice for this task.

Additional information:

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis