<div class="csl-bib-body">
<div class="csl-entry">Sula, J. (2026). <i>Faithful Attention Attribution in Vision Transformers for Chest X-Ray Interpretation</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.132361</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2026.132361
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/226961
-
dc.description
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
-
dc.description.abstract
Vision Transformers (ViTs) achieve strong performance in natural and medical imaging, yet their decision processes remain opaque. This is especially problematic in high-stakes settings like chest X-ray interpretation. TransMM is among the strongest attribution methods for ViTs, combining attention with class-specific gradients to highlight influential image patches. We ask whether injecting semantic structure from Sparse Autoencoders (SAEs) can further improve the faithfulness of such attributions.We introduce Feature-Gradient Attribution, which extends TransMM’s principle from attention space to feature space. SAEs are trained on residual streams to decompose activations into sparse, interpretable features, providing per-patch feature activations. We project gradients onto the SAE feature basis and compute feature-gradient scores that capture both which learned features are present and how they influence the target logit. These scores yield per-patch gates that modulate TransMM’s attention maps before relevance propagation, forming a lightweight, semantically informed correction.Across three datasets (chest X-rays, endoscopy, natural images), two architectures (finetuned ViT-B/16 and contrastively pre-trained CLIP ViT-B/32), and three complementary faithfulness metrics, our method improves attribution faithfulness consistently. Improvements are statistically significant (p<0.001) on all three metrics for one dataset and on two of three metrics for the remaining datasets. We observe gains of 10.5-34.8% on SaCo and 9.7-43.0% on Faithfulness Correlation, with Pixel Flipping improving by 1.8-10.8%. Notably, we never observe degradation relative to TransMM on any metric–dataset combination.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Vision Transformers
en
dc.subject
Attribution Methods
en
dc.subject
Faithfulness
en
dc.subject
Interpretability
en
dc.subject
Attention-Based Explanations
en
dc.subject
Sparse Autoencoders
en
dc.subject
Feature-Level Interpretability
en
dc.subject
Mechanistic Interpretability
en
dc.title
Faithful Attention Attribution in Vision Transformers for Chest X-Ray Interpretation