Panadero Palenzuela, R., Schörkhuber, D., & Gelautz, M. (2025). Importance-Guided Interpretability and Pruning for Video Transformers in Driver Action Recognition. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 5295–5304). https://doi.org/10.1109/WACV61041.2025.00517
Recently, transformers have gained prominence in video action recognition due to their ability to capture spatio-temporal dependencies. Despite their effectiveness, the interpretability of their self-attention mechanisms remains limited, posing obstacles in understanding model decisions, impacting transparency and bias identification. Additionally, the computational demands of transformer architectures, particularly the self-attention mechanism, present practical difficulties. To tackle both challenges, we adapt existing interpretability techniques and introduce a layer pruning method guided by importance metrics. In the context of driver action recognition, our findings highlight the efficacy of the applied head importance metrics in pinpointing crucial attention heads and identifying key visual cues essential for recognizing driver behavior. Experimental results, conducted on three mainstream video transformers, demonstrate the effectiveness of the proposed pruning technique with significantly reduced computational costs and only slight performance degradation by removing low-relevance layers. Specifically, on our DriverActionInsight (DAI) dataset, we achieve a 23.5% FLOPs saving in compressing Video Swin with less than a 1 % decrease in Top-1 accuracy.
en
Project title:
Simulation von Fahrzeuginnenräumen für die effiziente Entwicklung von Driver/Occupant Monitoring Systemen: 884336 (FFG - Österr. Forschungsförderungs- gesellschaft mbH) Empathic Vehicle: 4998519 (Wirtschaftsagentur Wien Ein Fonds der Stadt Wien)
-
Research Areas:
Visual Computing and Human-Centered Technology: 100%