Characterizing counterfactual explanation search spaces

Lutnik, Christian

doi:10.34726/hss.2024.111855

Record link:

https://doi.org/10.34726/hss.2024.111855
http://hdl.handle.net/20.500.12708/204477

Title:

Characterizing counterfactual explanation search spaces

Citation:

Lutnik, C. (2024). Characterizing counterfactual explanation search spaces [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.111855

reposiTUm DOI:

10.34726/hss.2024.111855

CatalogPlus:

AC17362492

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Lutnik, Christian

Advisor:

Cito, Jürgen

Organisational Unit:

E194 - Institut für Information Systems Engineering

Date (published):

2024

Number of Pages:

Keywords:

Model-driven engineering; SAP Core Data Services; Domain-specific language; CDS; Modeling tool; LSP; Langium; Sprotty

Abstract:

Derzeitige statische Quellcodeanalysierungssoftware wird derzeit nicht ausreichend verwendet, weil sie von einer hohen Rate an falsch positiven und falsch negativen Fehlern betroffen ist.Das Aufkommen von großen Sprachmodellen (LLM) trainiert auf umfassenden Mengen an Quellcode, sogenannten „Models of Code“, bringt neue Hoffnung für die Erkennung von Fehlern und Sicherheitslücken.Das Erkennen von Fehlern bevor Quellcode kompiliert wird, könnte die Zeit, um Fehler zu finden und zu beheben drastisch reduzieren.LLMs sind prinzipbedingt nicht in der Lage ihre Antworten zu erklären, was sich negativ auf ihre vernommene Vertrauenswürdigkeit und Sinnhaftigkeit auswirkt.Eine Möglichkeit diese Nachteile zu umgehen sind Counterfactuals.Ein Counterfactual versucht eine Antwort eines Black Box Models, wie eines LLMs, zu erklären, indem es die Eingabe so lange verändert, bis das Modell zu einem anderen Ergebnis kommt.Diese Änderung wird als Grund für das ursprüngliche Ergebnis interpretiert.Ist ein Counterfactual zu weit von der ursprünglichen Eingabe entfernt, liefert es keine Informationen, da bei einer anderen Eingabe ohnehin eine andere Ausgabe erwartet werden kann.Nach Counterfactuals zu suchen kann zeitaufwendig sein, da die fortlaufende Änderung der Eingabe einen exponentiellen Suchraum aufspannt.Diese Arbeit untersucht, wie unterschiedliche Suchalgorithmen und Konfigurationen den Suchraum beeinflussen, und verwendet als Vergleich eine exponentielle Suche.Die untersuchten Suchalgorithmen sind eine Gentische Suche (GS), eine Greedy-Suche, und eine Layer Integrierte Gradienten (LIG) Suche.Diese werden mit Maskierten Sprachmodellen und unterschiedlichen Models of Code, sowie mit verschiedenen Permutationsfunktionen und Tokenizern kombiniert.Die Suche nach Counterfactuals leidet unter der suboptimalen Genauigkeit der derzeit zur Verfügung stehenden Models of Code, welche auf dem binären Klassifikationsproblem der Erkennung von fehlerhaften oder vulnerablen C++ Quellcode eine Genauigkeit von maximal 68.78% aufweisen.Trotzdem sind sowohl die GS als auch die Greedy-Suche schneller auf einer Zeit pro Counterfactual Basis als eine k-exponentielle, umfassende Suche (kEES).Die LIG-Suche ist 14-mal langsamer pro Counterfactual als kEES, da 90% der Suchläufe kein Counterfactual finden können.Wird die Zeit bis zum ersten gefundenen Counterfactual betrachtet, ist der schnellste Algorithmus die Greedy-Suche, gefolgt von der LIG-Suche, und der GS.kEES ist der langsamste Suchalgorithmus.Die Wahl des Models of Code, der Perturbationsfunktion und des Tokenizers hat großen Einfluss auf die Suchdauer pro Counterfactual.

Current state-of-the-art static code analysis software is underused as it is plagued by a high rate of both false negatives and false positives.The emergence of Large Language Models (LLM) trained on comprehensive amounts of source code, so called models of code, provides new hope with detecting bugs and insecurities even before code is compiled, which could greatly reduce the time needed and the cost of finding and fixing bugs.However, LLMs are inherently unable to provide explanations for their outputs, which limits the trust that developers place in them, as well as their usefulness.One way to mitigate this drawback are explanations through counterfactuals.A counterfactual attempts to explain the reason for a certain decision of a LLM by perturbing the input such that the LLM comes to a different result.Then, this change might be the reason why the LLM arrived at its conclusion in the first place.Not all counterfactuals are equally useful.A counterfactual too far off the original input does not provide any information, as a completely different input may be expected to lead to a different output.Searching for counterfactuals can be a time consuming task, as just applying changes to the input until a counterfactual is found spans up an exponential search space.This thesis investigates how different search algorithms and configurations affect this search space as compared to an exponential exhaustive search as benchmark.The search algorithms in this thesis are the genetic search, greedy search, and the layer integrated gradient (LIG) search.These are combined with masked language models and different models of code, Tokenizers and perturbation functions.The search for counterfactuals suffers from the accuracy of current models of code, which, on a binary classification task of classifying source code as vulnerable or invulnerable, reach an accuracy of 68.78% or less on C++ source code.Even so, both the greedy search and a genetic search outperform the baseline k-Exponential Exhaustive Search (kEES) significantly in a time-per-counterfactual manner.The LIG search algorithm is 14 times slower than kEES, since 90% of its search run end without finding a counterfactual.When it comes to finding the first counterfactual, the greedy search algorithm is the fastest one, followed by the gradient informed search, the genetic search, and lastly the kEES.The choice of model of code, perturbation function, and Tokenizer greatly influences the search duration per counterfactual.

License:

In Copyright

Appears in Collections:

Thesis