Extraction of cyber threat intelligence from raw log data

Landauer, Max

doi:10.34726/hss.2022.103764

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2022.103764
http://hdl.handle.net/20.500.12708/20672

Titel:

Extraction of cyber threat intelligence from raw log data

Zitat:

Landauer, M. (2021). Extraction of cyber threat intelligence from raw log data [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.103764

reposiTUm-DOI:

10.34726/hss.2022.103764

CatalogPlus:

AC16573526

Publikationstyp:

Hochschulschrift - Dissertation

Sprache:

Englisch

Autor_innen:

Landauer, Max

Betreuer_in:

Rauber, Andreas

Organisationseinheit:

E194 - Institut für Information Systems Engineering

Datum (veröffentlicht):

2021

Umfang:

211

Keywords:

cyber threat intelligence; log data analysis; alert aggregation; security testbeds; intrusion detection; anomaly detection; system log data; log clustering; machine learning

Abstract:

The omnipresence of digital systems has led to an interconnected economy and society. Unfortunately, the introduction of new technologies in the rapidly expanding global networks has also enabled previously unimaginable threats. Cyber attackers are utilizing advanced tools and techniques to compromise systems and exploit vulnerabilities for the purpose of data exfiltration and destruction. Frequently targeted victims are corporations or organizations that often have no methods in place to detect such targeted attacks in time, resulting in financial and reputational losses. As a consequence, cyber security deploys so-called intrusion detection systems (IDS) to monitor system behavior and disclose suspicious activity. While signature-based IDSs that search for predefined patterns in logs are highly effective, they are unable to detect unknown attacks and rely on manually maintained databases of attack signatures. The main problem with such signatures is that they are often easy to evade and too simple to detect complex attack cases, and that their generation is slow and relies on domain knowledge. Anomaly-based IDSs seem to resolve some of these issues by leveraging machine learning to detect unknown attacks, however, are notorious for high false positive rates and produce anomalies that are difficult to interpret and relate to specific attacks. The idea presented in this dissertation is therefore to combine the advantages of both methods by generating so-called meta-alerts from sequences of anomalies that enable detection of the same or similar attacks on other systems, as achieved by signatures. For this purpose, a new alert aggregation mechanism is proposed that does not rely on any predefined knowledge about the deployed IDSs, observed attacks, or monitored systems. In particular, the method groups anomalies and alerts by their occurrence times and uses similarity metrics to cluster and merge groups into meta-alerts. For evaluation of the approach, anomalies are generated by a publicly availableanomaly-based IDS. As part of this dissertation, this IDS is extended by a concept for analyzing categorical values in log data. Thereby, statistical tests are used to recognize changes in value correlations as anomalies. Evaluating the ability to detect attacks requires labeled log data. The dissertation therefore also proposes a method for automatic testbed deployment. In particular, testbeds are instantiated from abstract templates following principles from model-driven engineering. This enables to generate arbitrary numbers of testbeds with dynamically assigned random values for specific testbed parameters, which introduces variations in the infrastructure, normal system behavior, and attack executions. The resulting log datasets are representative for diverse system environments and thus improve evaluations.

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis