Intrusion Detection Systems (IDS); Benchmark Datasets; Attack Profile Analysi; Post-analysis and Interpretability
en
Abstract:
Intrusion Detection System datasets are commonly used to build models for detecting network threats and classifying network traffic. These datasets contain captured and generated data for different families of network attack categories. Machine learning algorithms show promising results in classifying and detecting various attack types in these datasets; however, the post-analysis of the key characteristics of the attack classes still needs to be explored in further detail. The question arises whether the attack classes derived from models trained on these datasets are truly representative of attack characteristics in real-world traffic and whether they are discriminatory and transferable, or merely accidental in nature.To address this, a testbed is constructed that handles flow aggregation, labeling, preprocessing, supervised analysis and post-analysis across selected intrusion detection system datasets, namely Kitsune and TII-SSRC-23. The post-analysis provides a framework combining visualization and interpretation methods with statistical metrics. The framework aims to provide insight into the intrinsic attack characteristics learned by machine learning models trained on the selected datasets. Qualitative profiles for each attack type are defined. These profiles are then assessed using domain expertise and applied to real-world network traces provided by the Measurement and Analysis on the Widely Integrated Distributed Environment Internet group to estimate their recurrence and relevance in real-world traffic.The results show that, although some of the extracted attack profiles largely align with domain knowledge, they are strongly influenced by specific dataset configurations and artifacts. Furthermore, among the selected datasets, the discriminative features defining the profiles for the same attack type differ entirely, limiting the transferability of these profiles. The real-world comparison also reveals weaknesses in the intrusion detection system datasets. The Kitsune dataset shows some realistic and distinct attack patterns, but under-represents real-world variability. TII-SSRC-23 exhibits a single dominant ray, lacking the complexity of real traffic behavior.The resulting insights highlight the importance of rigorous post-analysis in the evaluation of Intrusion Detection System datasets when training and deploying machine learning models. Post-analysis helps uncover dataset biases, artifacts and modeling limitations, enabling the development of intrusion detection systems that generalize beyond the specific datasets on which they are trained.