Machine Learning for Vulnerability Detection in Smart Contracts. A Comparison of Approaches.

Klein, Stephan

doi:10.34726/hss.2025.123001

Record link:

https://doi.org/10.34726/hss.2025.123001
http://hdl.handle.net/20.500.12708/220334

Title:

Machine Learning for Vulnerability Detection in Smart Contracts. A Comparison of Approaches.

Citation:

Klein, S. (2025). Machine Learning for Vulnerability Detection in Smart Contracts. A Comparison of Approaches. [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.123001

reposiTUm DOI:

10.34726/hss.2025.123001

CatalogPlus:

AC17679878

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Klein, Stephan

Advisor:

Salzer, Gernot

Co-advisor:

di Angelo, Monika

Organisational Unit:

E192 - Institut für Logic and Computation

Date (published):

2025

Number of Pages:

Keywords:

Smart contracts; Reentrancy; Vulnerability Detection; Machine Learning; Graph Neural Networks; Bidirectional LSTM; Ethereum; Benchmark; Empirical Evaluation; Systematic Literature Review

Abstract:

Smart Contracts (Blockchain-Programme) sichern mittlerweile beträchtliche Vermögenswerte und sind daher ein attraktives Ziel für Angreifer. Dabei stellt Reentrancy eine wichtige Klasse an Schwachstellen dar. Gleichzeitig führt das wachsende Interesse an maschinellem Lernen (ML) zu einer Vielzahl von Vorschlägen für automatisierte Schwachstellenerkennung als Alternative zu klassischen Analysemethoden. Bestehende Übersichtsarbeiten dazu führen verschiedene Methoden an, Klassifikationen bleiben jedoch oft ungenau oder unvollständig, und vergleichende empirische Evaluierungen sind rar. Um diese Lücken zu schließen, gehen wir als Teil dieser Arbeit folgendermaßen vor:(1) Zur Bewertung des Stands der Technik führen wir eine systematische Literaturrecherche durch und leiten daraus eine Taxonomie ab, die Lernparadigmen mit konkreten Modellfamilien verknüpft, welche in der Smart-Contract-Analyse zum Einsatz kommen.(2) In einer vergleichenden Modellanalyse untersuchen wir zwei repräsentative ML–basierte Detektoren—MANDO-HGT (basierend auf einem Graph Neural Network) undVulHunter (basierend auf einem Bi-LSTM)—durch Auswertung der jeweiligen Publikationen und zugehörigen Open-Source-Repositories. (3) In einer empirischen Testreihe integrieren wir die ausgewählten Tools in die Testumgebung SmartBugs und führen drei Experimente durch: (I) Test der Modelle mit einer Auswahl von minimalen Reentrancy-Beispiel-Contracts (II) Tests mit einem größeren Real-World Datensatz; und (III) Retraining der Modelle am gleichen Datenset.Ergebnisse. Das originale VulHunter ist konkurrenzfähig, bleibt jedoch beim Recall hinter klassischen Analysetools wie Slither und Mythril zurück. Das MANDO-Basismodell zeigt eine hohe Übererkennung (hoher Recall, geringe Präzision) und MANDO-HGT bleibt hinter den originalen Ergebnissen seiner Autoren zurück. Nach dem Retraining beider Modelle mit identischen Daten übertrifft VulHunter auf einem Holdout-Set von 120 Contracts klassische Analysetools bei F1 und Präzision, jedoch immer noch nicht beim Recall. Insgesamt legen unsere Ergebnisse nahe, dass ML ein sinnvoller ergänzender Ansatz zur traditionellen Schwachstellenanalyse sein kann—insbesondere bei sorgfältigem Instanzentwurf sowie verlässlichen Labels.Beiträge. (i) eine klar strukturierte Taxonomie von ML-Techniken zur Erkennung von Smart-Contract-Schwachstellen; (ii) ein detaillierter Modellvergleich von verschiedenen ML-basierten Ansätzen (iii) eine Erweiterung des SmartBugs-Frameworks um ausgewählte ML-Tools; sowie (iv) eine empirische Evaluation ML-basierter Werkzeuge.

Smart contracts (blockchain programs) now safeguard substantial assets and, as recent exploits show, remain attractive targets for attackers. Reentrancy persists as a damaging class of vulnerabilities. In parallel, enthusiasm for machine learning (ML) has driven a surge of proposals for automated vulnerability detection as an alternative to conventional methods. Yet the evidence base is uneven: surveys catalog methods, yet produce inaccurate or incomplete taxonomies. Comparable tool-level insights and empirical evaluations are scarce, and labeling practices remain opaque.This thesis addresses that gap in three steps: (1) For assessing the state-of-the-art, we conduct a systematic literature review and derive a taxonomy that connects learning paradigms to concrete model families applied to smart-contract analysis. (2) In a comparative paper-and-code analysis, we study two representative ML-based detectors—MANDO-HGT (based on a graph neural network) and VulHunter (based on a Bi-LSTM)—by extracting data from the respective publications and inspecting their open-source repositories, making explicit how datasets are comprised, how instances are constructed and how information flows through model layers. (3) In an empiricalevaluation, we integrate the selected tools into the SmartBugs test environment and perform three experiments: (I) a minimal reentrancy suite (260 contracts) to isolate the signal; (II) a larger, noisier benchmark (987 contracts); and (III) a fair, same-data retraining (591 contracts) to separate data effects from architectural effects.Findings. Pretrained VulHunter is competitive but trails static analysis tools like Slither and Mythril in recall; the original MANDO base model over-flags (high recall, low precision); and MANDO-HGT underperforms its reported results on our datasets. After retraining both tools on identical data, VulHunter achieves the best overall balance on a 120-contract holdout, surpassing conventional analyzers in F1/precision—though not in peak recall. Overall, our results suggest that ML is a viable complement to traditional analyzers for reentrancy—especially when instance design is careful and labels are reliable.Contributions. (i) a clarified, usage-oriented taxonomy of ML techniques for smart contract vulnerability detection; (ii) an implementation-grounded, tool-level comparison that makes instance construction and model behavior explicit; (iii) an extension of the SmartBugs framework with a set of ML tools and updates of conventional tools; and (iv) an empirical evaluation of ML-based tools.

Additional information:

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis