Faithfulness of Natural Language Explanations for Vision Language Models: An Automated Test Framework

Süss, Maximilian

doi:10.34726/hss.2026.132363

Record link:

https://doi.org/10.34726/hss.2026.132363
http://hdl.handle.net/20.500.12708/226904

Title:

Faithfulness of Natural Language Explanations for Vision Language Models: An Automated Test Framework

Citation:

Süss, M. (2026). Faithfulness of Natural Language Explanations for Vision Language Models: An Automated Test Framework [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.132363

reposiTUm DOI:

10.34726/hss.2026.132363

CatalogPlus:

AC17801339

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Süss, Maximilian

Advisor:

Lukasiewicz, Thomas

Co-advisor:

Menzat, Bayar Ilhan

Organisational Unit:

E192 - Institut für Logic and Computation

Date (published):

2026

Number of Pages:

Keywords:

Faithfulness; Vision-Language-Model; VLM; Natural-Language-Explanations; NLE; Counterfactual; Tests

Abstract:

Natural-language explanations (NLEs) produced by vision-language models (VLMs) are often treated as indicators of transparency, yet their faithfulness, i.e., whether they truly reflect the factors driving model decisions, remains largely unexamined. This thesis introduces a counterfactual, intervention-based framework for evaluating explanation faithfulness in multimodal settings, extending prior text-only approaches to object-level image interventions. We derive two metrics, the visual counterfactual test (vCT) and the visual correlational counterfactual test (vCCT), and support scalable evaluation through an automated pipeline that generates high-quality counterfactual image pairs via semantic object removal. Applying this framework to seven state-of-the-art VLMs on SNLI-VE, we find that explanations are often but not always faithful. While faithfulness scores are generally high, unfaithful cases are consistently observed. Moreover, the choice of prompting strategy has an impact, with predict then explain prompting yielding more faithful explanations than explain then predict. These findings highlight both the promise and the limitations of current VLM explanations and emphasize the need for rigorous faithfulness evaluation before relying on them in high-stakes applications.

Additional information:

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft

License:

In Copyright

Appears in Collections:

Thesis