Extraction of cyber threat intelligence from raw log data

Landauer, Max

doi:10.34726/hss.2022.103764

DC Field

Value

Language

dc.contributor.advisor

Rauber, Andreas

dc.contributor.author

Landauer, Max

dc.date.accessioned

2022-07-14T12:14:52Z

dc.date.issued

2021

dc.date.submitted

2022-07

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Landauer, M. (2021). <i>Extraction of cyber threat intelligence from raw log data</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.103764</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2022.103764

dc.identifier.uri

http://hdl.handle.net/20.500.12708/20672

dc.description.abstract

The omnipresence of digital systems has led to an interconnected economy and society. Unfortunately, the introduction of new technologies in the rapidly expanding global networks has also enabled previously unimaginable threats. Cyber attackers are utilizing advanced tools and techniques to compromise systems and exploit vulnerabilities for the purpose of data exfiltration and destruction. Frequently targeted victims are corporations or organizations that often have no methods in place to detect such targeted attacks in time, resulting in financial and reputational losses. As a consequence, cyber security deploys so-called intrusion detection systems (IDS) to monitor system behavior and disclose suspicious activity. While signature-based IDSs that search for predefined patterns in logs are highly effective, they are unable to detect unknown attacks and rely on manually maintained databases of attack signatures. The main problem with such signatures is that they are often easy to evade and too simple to detect complex attack cases, and that their generation is slow and relies on domain knowledge. Anomaly-based IDSs seem to resolve some of these issues by leveraging machine learning to detect unknown attacks, however, are notorious for high false positive rates and produce anomalies that are difficult to interpret and relate to specific attacks. The idea presented in this dissertation is therefore to combine the advantages of both methods by generating so-called meta-alerts from sequences of anomalies that enable detection of the same or similar attacks on other systems, as achieved by signatures. For this purpose, a new alert aggregation mechanism is proposed that does not rely on any predefined knowledge about the deployed IDSs, observed attacks, or monitored systems. In particular, the method groups anomalies and alerts by their occurrence times and uses similarity metrics to cluster and merge groups into meta-alerts. For evaluation of the approach, anomalies are generated by a publicly availableanomaly-based IDS. As part of this dissertation, this IDS is extended by a concept for analyzing categorical values in log data. Thereby, statistical tests are used to recognize changes in value correlations as anomalies. Evaluating the ability to detect attacks requires labeled log data. The dissertation therefore also proposes a method for automatic testbed deployment. In particular, testbeds are instantiated from abstract templates following principles from model-driven engineering. This enables to generate arbitrary numbers of testbeds with dynamically assigned random values for specific testbed parameters, which introduces variations in the infrastructure, normal system behavior, and attack executions. The resulting log datasets are representative for diverse system environments and thus improve evaluations.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

cyber threat intelligence

dc.subject

log data analysis

dc.subject

alert aggregation

dc.subject

security testbeds

dc.subject

intrusion detection

dc.subject

anomaly detection

dc.subject

system log data

dc.subject

log clustering

dc.subject

machine learning

dc.title

Extraction of cyber threat intelligence from raw log data

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2022.103764

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Max Landauer

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E194 - Institut für Information Systems Engineering

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC16573526

dc.description.numberOfPages

211

dc.thesistype

Dissertation

dc.thesistype

Dissertation

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.advisor.orcid

0000-0002-9272-6225

item.languageiso639-1

item.openairetype

doctoral thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.openaccessfulltext

Open Access

crisitem.author.dept

E105 - Institut für Stochastik und Wirtschaftsmathematik

crisitem.author.parentorg

E100 - Fakultät für Mathematik und Geoinformation

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(4.14 MB)

In Copyright

Show simple item record

Page view(s)

985

checked on Nov 21, 2023

Download(s)

430

checked on Nov 21, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM