DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement Classification

Lin, Tingyu; Dadras, Armin; Kleber, Florian; Sablatnig, Robert

doi:10.1145/3746273.3760209

DC Field

Value

Language

dc.contributor.author

Lin, Tingyu

dc.contributor.author

Dadras, Armin

dc.contributor.author

Kleber, Florian

dc.contributor.author

Sablatnig, Robert

dc.date.accessioned

2025-12-23T09:20:21Z

dc.date.available

2025-12-23T09:20:21Z

dc.date.issued

2025-10-26

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Lin, T., Dadras, A., Kleber, F., & Sablatnig, R. (2025). DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement Classification. In <i>SUMAC ’25: Proceedings of the 7th International Workshop on analySis, Understanding and proMotion of heritAge Contents</i> (pp. 13–21). The Association for Computing Machinery. https://doi.org/10.1145/3746273.3760209</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/223142

dc.description.abstract

Camera movement classification (CMC) models trained on contemporary, high-quality footage often degrade when applied to archival film, where noise, missing frames, and low contrast obscure motion cues. We bridge this gap by assembling a unified benchmark that consolidates two modern corpora into four canonical classes and restructures the HISTORIAN collection into five balanced categories. Building on this benchmark, we introduce DGME-T, a lightweight extension to the Video Swin Transformer that injects directional grid motion encoding, derived from optical flow, via a learnable and normalized late-fusion layer. DGME-T raises the backbone's top-1 accuracy from 81.78% to 86.14% and its macro F1 from 82.08% to 87.81% on modern clips, while still improving the demanding World-War-II footage from 83.43% to 84.62% accuracy and from 81.72% to 82.63% macro F1. A cross-domain study further shows that an intermediate fine-tuning stage on modern data increases historical performance by more than five percentage points. These results demonstrate that structured motion priors and transformer representations are complementary and that even a small, carefully calibrated motion head can substantially enhance robustness in degraded film analysis.

dc.description.sponsorship

FWF - Österr. Wissenschaftsfonds

dc.language.iso

dc.subject

Historical Camera Movement Classification

dc.subject

Historical Video / Archival Film

dc.subject

Optical Flow

dc.subject

Domain Adaptation

dc.subject

Late-Fusion Layer

dc.subject

Robustness

dc.subject

Digital Heritage & Preservation

dc.subject

Video Transformer

dc.subject

Directional Grid Motion Encoding

dc.title

DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement Classification

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.contributor.affiliation

St. Pölten University of Applied Sciences, Austria

dc.relation.isbn

979-8-4007-2055-0

dc.description.startpage

dc.description.endpage

dc.relation.grantno

DFH 37-N

dc.type.category

Full-Paper Contribution

tuw.booktitle

SUMAC '25: Proceedings of the 7th International Workshop on analySis, Understanding and proMotion of heritAge Contents

tuw.peerreviewed

true

tuw.relation.publisher

The Association for Computing Machinery

tuw.project.title

Visuelle Analytik und Computer Vision treffen auf kulturelles Erbe

tuw.researchTopic.id

tuw.researchTopic.name

Visual Computing and Human-Centered Technology

tuw.researchTopic.value

100

tuw.publication.orgunit

E193-01 - Forschungsbereich Computer Vision

tuw.publication.orgunit

E056-12 - Fachbereich ENROL DP

tuw.publication.orgunit

E056-18 - Fachbereich Visual Analytics and Computer Vision Meet Cultural Heritage

tuw.publisher.doi

10.1145/3746273.3760209

dc.description.numberOfPages

tuw.author.orcid

0009-0008-9825-686X

tuw.author.orcid

0000-0001-6474-7208

tuw.author.orcid

0000-0001-8351-5066

tuw.author.orcid

0000-0003-4195-1593

tuw.event.name

SUMAC '25: The 7th International Workshop on analySis, Understanding and proMotion of heritAge Contents

tuw.event.startdate

27-10-2025

tuw.event.enddate

31-10-2025

tuw.event.online

On Site

tuw.event.type

Event for scientific audience

tuw.event.place

Dublin

tuw.event.country

tuw.event.presenter

Lin, Tingyu

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.openairetype

conference paper

item.openairecristype

http://purl.org/coar/resource_type/c_5794

item.cerifentitytype

Publications

item.languageiso639-1

item.grantfulltext

restricted

item.fulltext

no Fulltext

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.dept

TU Wien, Austria

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.dept

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.orcid

0009-0008-9825-686X

crisitem.author.orcid

0000-0001-6474-7208

crisitem.author.orcid

0000-0001-8351-5066

crisitem.author.orcid

0000-0003-4195-1593

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

crisitem.author.parentorg

E180 - Fakultät für Informatik

crisitem.project.funder

FWF - Österr. Wissenschaftsfonds

crisitem.project.grantno

DFH 37-N

Appears in Collections:

Conference Paper

Show simple item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM