<div class="csl-bib-body">
<div class="csl-entry">Süss, M. (2026). <i>Faithfulness of Natural Language Explanations for Vision Language Models: An Automated Test Framework</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.132363</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2026.132363
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/226904
-
dc.description
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
-
dc.description.abstract
Natural-language explanations (NLEs) produced by vision-language models (VLMs) are often treated as indicators of transparency, yet their faithfulness, i.e., whether they truly reflect the factors driving model decisions, remains largely unexamined. This thesis introduces a counterfactual, intervention-based framework for evaluating explanation faithfulness in multimodal settings, extending prior text-only approaches to object-level image interventions. We derive two metrics, the visual counterfactual test (vCT) and the visual correlational counterfactual test (vCCT), and support scalable evaluation through an automated pipeline that generates high-quality counterfactual image pairs via semantic object removal. Applying this framework to seven state-of-the-art VLMs on SNLI-VE, we find that explanations are often but not always faithful. While faithfulness scores are generally high, unfaithful cases are consistently observed. Moreover, the choice of prompting strategy has an impact, with predict then explain prompting yielding more faithful explanations than explain then predict. These findings highlight both the promise and the limitations of current VLM explanations and emphasize the need for rigorous faithfulness evaluation before relying on them in high-stakes applications.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Faithfulness
en
dc.subject
Vision-Language-Model
en
dc.subject
VLM
en
dc.subject
Natural-Language-Explanations
en
dc.subject
NLE
en
dc.subject
Counterfactual
en
dc.subject
Tests
en
dc.title
Faithfulness of Natural Language Explanations for Vision Language Models: An Automated Test Framework