Ferdigg, J. (2022). Self-supervised pre-training on LSTM and transformer models for network intrusion detection [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.91662
Machine learning techniques and Deep Neural Networks (DNNs) have found their way into various disciplines. Their possible benefits are explored for a diverse range of applications. The pattern matching capabilities of modern day machine learning models have long surpassed expert systems or even humans in narrow applications. Their ability to accurately classify seemingly complex data makes them well suited to also be used in the context of Network Intrusion Detection (NID). While supervised learning is still most effective when training machine learning models, its feasibility is often stifled by a lack of expensive labeled data. For this, among other reasons, researchers at the forefront of machine learning development, especially in the field of Natural Language Processing (NLP), have began to pre-train their models on large amounts of unlabeled data to overcome the scarcity of labeled data. A commonly used pattern e.g. used to train Google’s Bidirectional Encoder Representations from Transformers (BERT) model, is to pre-train large scale machine learning models in a self-supervised manner.This is done by tasking the model to either reconstruct omitted parts of information from the input data, predicting future input or asking other questions about the input data to which the answer is derivable from the unlabeled data. Only a small amount of labeled data is then used to fine-tune the model to perform the target downstream task. Inspired by the achievements of models like BERT and its successors, we used the same methods to increase classification accuracy for deep learning based Network Intrusion Detection System (NIDS). In our research we try to answer the question whether pre-training paradigms used in NLP can improve classification accuracy for deep learning based NIDS. We performed pre-training on Long Short-Term Memory (LSTM) and transformer encoder models with a set of devised auto encoding and auto regression based self-supervised training methods to improve binary classification of network traffic records. After pre-training we use supervised fine-tuning with a small amount of labeled data to teach the model how to classify the data into attack and benign flows. As training data we used flow representations of the CIC-IDS2017 and UNSW-NB15 NID datasets with the flow key . Our flows consist of a sequence of tensors containing packet and flow specific features. Our results show that classification accuracy can be improved through pre-training, but only in specific instances. Further inquiry is needed to see if our results can be generalized.