Evaluating intrusion detection benchmark datasets via post-analysis of learned attack profiles

Illes, Isabella

doi:10.34726/hss.2026.139421

DC Field

Value

Language

dc.contributor.advisor

Iglesias Vazquez, Felix

dc.contributor.author

Illes, Isabella

dc.date.accessioned

2026-04-08T09:22:06Z

dc.date.issued

2026

dc.date.submitted

2026-03

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Illes, I. (2026). <i>Evaluating intrusion detection benchmark datasets via post-analysis of learned attack profiles</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.139421</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2026.139421

dc.identifier.uri

http://hdl.handle.net/20.500.12708/227468

dc.description.abstract

Intrusion Detection System datasets are commonly used to build models for detecting network threats and classifying network traffic. These datasets contain captured and generated data for different families of network attack categories. Machine learning algorithms show promising results in classifying and detecting various attack types in these datasets; however, the post-analysis of the key characteristics of the attack classes still needs to be explored in further detail. The question arises whether the attack classes derived from models trained on these datasets are truly representative of attack characteristics in real-world traffic and whether they are discriminatory and transferable, or merely accidental in nature.To address this, a testbed is constructed that handles flow aggregation, labeling, preprocessing, supervised analysis and post-analysis across selected intrusion detection system datasets, namely Kitsune and TII-SSRC-23. The post-analysis provides a framework combining visualization and interpretation methods with statistical metrics. The framework aims to provide insight into the intrinsic attack characteristics learned by machine learning models trained on the selected datasets. Qualitative profiles for each attack type are defined. These profiles are then assessed using domain expertise and applied to real-world network traces provided by the Measurement and Analysis on the Widely Integrated Distributed Environment Internet group to estimate their recurrence and relevance in real-world traffic.The results show that, although some of the extracted attack profiles largely align with domain knowledge, they are strongly influenced by specific dataset configurations and artifacts. Furthermore, among the selected datasets, the discriminative features defining the profiles for the same attack type differ entirely, limiting the transferability of these profiles. The real-world comparison also reveals weaknesses in the intrusion detection system datasets. The Kitsune dataset shows some realistic and distinct attack patterns, but under-represents real-world variability. TII-SSRC-23 exhibits a single dominant ray, lacking the complexity of real traffic behavior.The resulting insights highlight the importance of rigorous post-analysis in the evaluation of Intrusion Detection System datasets when training and deploying machine learning models. Post-analysis helps uncover dataset biases, artifacts and modeling limitations, enabling the development of intrusion detection systems that generalize beyond the specific datasets on which they are trained.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Intrusion Detection Systems (IDS)

dc.subject

Benchmark Datasets

dc.subject

Attack Profile Analysi

dc.subject

Post-analysis and Interpretability

dc.title

Evaluating intrusion detection benchmark datasets via post-analysis of learned attack profiles

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2026.139421

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Isabella Illes

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Zseby, Tanja

tuw.publication.orgunit

E389 - Institute of Telecommunications

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17833645

dc.description.numberOfPages

134

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0001-6081-969X

tuw.assistant.orcid

0000-0002-5391-467X

item.fulltext

with Fulltext

item.grantfulltext

open

item.cerifentitytype

Publications

item.openairetype

master thesis

item.openaccessfulltext

Open Access

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.mimetype

application/pdf

item.languageiso639-1

crisitem.author.dept

E384-01 - Forschungsbereich Software-intensive Systems

crisitem.author.parentorg

E384 - Institut für Computertechnik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(3.16 MB)

In Copyright

Show simple item record

Page view(s)

checked on Apr 8, 2026

Download(s)

127

checked on Apr 8, 2026

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM