Faithfulness of Natural Language Explanations for Vision Language Models: An Automated Test Framework

Süss, Maximilian

doi:10.34726/hss.2026.132363

DC Field

Value

Language

dc.contributor.advisor

Lukasiewicz, Thomas

dc.contributor.author

Süss, Maximilian

dc.date.accessioned

2026-03-13T07:22:08Z

dc.date.issued

2026

dc.date.submitted

2026-02

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Süss, M. (2026). <i>Faithfulness of Natural Language Explanations for Vision Language Models: An Automated Test Framework</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.132363</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2026.132363

dc.identifier.uri

http://hdl.handle.net/20.500.12708/226904

dc.description

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft

dc.description.abstract

Natural-language explanations (NLEs) produced by vision-language models (VLMs) are often treated as indicators of transparency, yet their faithfulness, i.e., whether they truly reflect the factors driving model decisions, remains largely unexamined. This thesis introduces a counterfactual, intervention-based framework for evaluating explanation faithfulness in multimodal settings, extending prior text-only approaches to object-level image interventions. We derive two metrics, the visual counterfactual test (vCT) and the visual correlational counterfactual test (vCCT), and support scalable evaluation through an automated pipeline that generates high-quality counterfactual image pairs via semantic object removal. Applying this framework to seven state-of-the-art VLMs on SNLI-VE, we find that explanations are often but not always faithful. While faithfulness scores are generally high, unfaithful cases are consistently observed. Moreover, the choice of prompting strategy has an impact, with predict then explain prompting yielding more faithful explanations than explain then predict. These findings highlight both the promise and the limitations of current VLM explanations and emphasize the need for rigorous faithfulness evaluation before relying on them in high-stakes applications.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Faithfulness

dc.subject

Vision-Language-Model

dc.subject

VLM

dc.subject

Natural-Language-Explanations

dc.subject

NLE

dc.subject

Counterfactual

dc.subject

Tests

dc.title

Faithfulness of Natural Language Explanations for Vision Language Models: An Automated Test Framework

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2026.132363

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Maximilian Süss

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Menzat, Bayar Ilhan

tuw.publication.orgunit

E192 - Institut für Logic and Computation

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17801339

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.grantfulltext

open

item.cerifentitytype

Publications

item.openairetype

master thesis

item.mimetype

application/pdf

item.languageiso639-1

item.fulltext

with Fulltext

item.openaccessfulltext

Open Access

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(1.83 MB)

In Copyright

Show simple item record

Page view(s)

checked on Mar 13, 2026

Download(s)

checked on Mar 13, 2026

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM