<div class="csl-bib-body">
<div class="csl-entry">Panadero Palenzuela, R., Schörkhuber, D., & Gelautz, M. (2025). Importance-Guided Interpretability and Pruning for Video Transformers in Driver Action Recognition. In <i>2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</i> (pp. 5295–5304). https://doi.org/10.1109/WACV61041.2025.00517</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/215606
-
dc.description.abstract
Recently, transformers have gained prominence in video action recognition due to their ability to capture spatio-temporal dependencies. Despite their effectiveness, the interpretability of their self-attention mechanisms remains limited, posing obstacles in understanding model decisions, impacting transparency and bias identification. Additionally, the computational demands of transformer architectures, particularly the self-attention mechanism, present practical difficulties. To tackle both challenges, we adapt existing interpretability techniques and introduce a layer pruning method guided by importance metrics. In the context of driver action recognition, our findings highlight the efficacy of the applied head importance metrics in pinpointing crucial attention heads and identifying key visual cues essential for recognizing driver behavior. Experimental results, conducted on three mainstream video transformers, demonstrate the effectiveness of the proposed pruning technique with significantly reduced computational costs and only slight performance degradation by removing low-relevance layers. Specifically, on our DriverActionInsight (DAI) dataset, we achieve a 23.5% FLOPs saving in compressing Video Swin with less than a 1 % decrease in Top-1 accuracy.
en
dc.description.sponsorship
FFG - Österr. Forschungsförderungs- gesellschaft mbH
-
dc.description.sponsorship
Wirtschaftsagentur Wien Ein Fonds der Stadt Wien
-
dc.language.iso
en
-
dc.subject
Measurement
en
dc.subject
Degradation
en
dc.subject
Visualization
en
dc.subject
Computer vision
en
dc.subject
Adaptation models
en
dc.subject
Computational modeling
en
dc.subject
Computer architecture
en
dc.subject
Transformers
en
dc.subject
Computational efficiency
en
dc.subject
Driver behavior
en
dc.title
Importance-Guided Interpretability and Pruning for Video Transformers in Driver Action Recognition
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
TU Wien, Austria
-
dc.relation.isbn
979-8-3315-1083-1
-
dc.relation.doi
10.1109/WACV61041.2025
-
dc.description.startpage
5295
-
dc.description.endpage
5304
-
dc.relation.grantno
884336
-
dc.relation.grantno
4998519
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
-
tuw.peerreviewed
true
-
tuw.project.title
Simulation von Fahrzeuginnenräumen für die effiziente Entwicklung von Driver/Occupant Monitoring Systemen
-
tuw.project.title
Empathic Vehicle
-
tuw.researchTopic.id
I5
-
tuw.researchTopic.name
Visual Computing and Human-Centered Technology
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E193-01 - Forschungsbereich Computer Vision
-
tuw.publication.orgunit
E056-19 - Fachbereich Precision Livestock Farming
-
tuw.publisher.doi
10.1109/WACV61041.2025.00517
-
dc.description.numberOfPages
10
-
tuw.author.orcid
0000-0003-2015-6507
-
tuw.author.orcid
0000-0002-9476-0865
-
tuw.event.name
2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025)
en
tuw.event.startdate
26-02-2025
-
tuw.event.enddate
06-03-2025
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Tucson, Arizona
-
tuw.event.country
US
-
tuw.event.presenter
Panadero Palenzuela, Raquel
-
wb.sciencebranch
Informatik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.value
100
-
item.grantfulltext
restricted
-
item.openairetype
conference paper
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.cerifentitytype
Publications
-
item.languageiso639-1
en
-
item.fulltext
no Fulltext
-
crisitem.author.dept
TU Wien, Austria
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.orcid
0000-0003-2015-6507
-
crisitem.author.orcid
0000-0002-9476-0865
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology
-
crisitem.project.funder
FFG - Österr. Forschungsförderungs- gesellschaft mbH