Importance-Guided Interpretability and Pruning for Video Transformers in Driver Action Recognition

Panadero Palenzuela, Raquel; Schörkhuber, Dominik; Gelautz, Margrit

doi:10.1109/WACV61041.2025.00517

Datensatz Zitierlink:

http://hdl.handle.net/20.500.12708/215606

Titel:

Importance-Guided Interpretability and Pruning for Video Transformers in Driver Action Recognition

Zitat:

Panadero Palenzuela, R., Schörkhuber, D., & Gelautz, M. (2025). Importance-Guided Interpretability and Pruning for Video Transformers in Driver Action Recognition. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 5295–5304). https://doi.org/10.1109/WACV61041.2025.00517

Verlags-DOI:

10.1109/WACV61041.2025.00517

Publikationstyp:

Konferenzbeitrag - Full-Paper Contribution

Sprache:

Englisch

Autor_innen:

Panadero Palenzuela, Raquel
Schörkhuber, Dominik
Gelautz, Margrit

Organisationseinheit:

E193-01 - Forschungsbereich Computer Vision
E056-19 - Fachbereich Precision Livestock Farming

Erschienen in:

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

ISBN:

979-8-3315-1083-1

DOI des Buches:

10.1109/WACV61041.2025

Datum (veröffentlicht):

8-Apr-2025

Veranstaltungsname:

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025)

Veranstaltungszeitraum:

26-Feb-2025 - 6-Mär-2025

Veranstaltungsort:

Tucson, Arizona, Vereinigte Staaten von Amerika

Umfang:

Peer Reviewed:

Keywords:

Measurement; Degradation; Visualization; Computer vision; Adaptation models; Computational modeling; Computer architecture; Transformers; Computational efficiency; Driver behavior

Abstract:

Recently, transformers have gained prominence in video action recognition due to their ability to capture spatio-temporal dependencies. Despite their effectiveness, the interpretability of their self-attention mechanisms remains limited, posing obstacles in understanding model decisions, impacting transparency and bias identification. Additionally, the computational demands of transformer architectures, particularly the self-attention mechanism, present practical difficulties. To tackle both challenges, we adapt existing interpretability techniques and introduce a layer pruning method guided by importance metrics. In the context of driver action recognition, our findings highlight the efficacy of the applied head importance metrics in pinpointing crucial attention heads and identifying key visual cues essential for recognizing driver behavior. Experimental results, conducted on three mainstream video transformers, demonstrate the effectiveness of the proposed pruning technique with significantly reduced computational costs and only slight performance degradation by removing low-relevance layers. Specifically, on our DriverActionInsight (DAI) dataset, we achieve a 23.5% FLOPs saving in compressing Video Swin with less than a 1 % decrease in Top-1 accuracy.

Projekttitel:

Simulation von Fahrzeuginnenräumen für die effiziente Entwicklung von Driver/Occupant Monitoring Systemen: 884336 (FFG - Österr. Forschungsförderungs- gesellschaft mbH)
Empathic Vehicle: 4998519 (Wirtschaftsagentur Wien Ein Fonds der Stadt Wien)

Forschungsschwerpunkte:

Visual Computing and Human-Centered Technology: 100%

Wissenschaftszweig:

1020 - Informatik: 100%

Enthalten in den Sammlungen:

Conference Paper

Zur Langanzeige

Seiten Aufrufe

aufgerufen am 22.05.2025

Download(s)

aufgerufen am 22.05.2025

Google Scholar^TM

Check

Seiten Aufrufe

Download(s)

Google ScholarTM

Google Scholar^TM