<div class="csl-bib-body">
<div class="csl-entry">Sick, L., Engel, D., Hermosilla, P., & Ropinski, T. (2025). Attention-Guided Masked Autoencoders for Learning Image Representations. In <i>2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</i> (pp. 836–846). IEEE. https://doi.org/10.1109/WACV61041.2025.00091</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/223689
-
dc.description.abstract
Masked autoencoders (MAEs) have established themselves as a powerful pretraining method for computer vision tasks. While vanilla MAEs put equal emphasis on reconstructing the individual parts of the image, we propose to inform the reconstruction process through an attention-guided loss function. By leveraging advances in unsupervised object discovery, we obtain an attention map of the scene which we employ in the loss function to put increased emphasis on reconstructing relevant objects. Thus, we in-centivize the model to learn improved representations of the scene for a variety of tasks. Our evaluations show that our pretrained models produce off-the-shelf representations more effective than the vanilla MAE for such tasks, demonstrated by improved linear probing and k-NN classification results on several benchmarks while at the same time making ViTs more robust against varying backgrounds and changes in texture.
en
dc.language.iso
en
-
dc.relation.ispartofseries
IEEE Workshop on Applications of Computer Vision (WACV)
-
dc.subject
attention
en
dc.subject
multi-modal learning
en
dc.subject
self-supervised pre-training
en
dc.subject
masked autoencoders
en
dc.title
Attention-Guided Masked Autoencoders for Learning Image Representations
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
Universität Ulm, Germany
-
dc.contributor.affiliation
Universität Ulm, Germany
-
dc.contributor.affiliation
Universität Ulm, Germany
-
dc.relation.isbn
979-8-3315-1083-1
-
dc.relation.doi
10.1109/WACV61041.2025
-
dc.relation.issn
2472-6737
-
dc.description.startpage
836
-
dc.description.endpage
846
-
dc.type.category
Full-Paper Contribution
-
dc.relation.eissn
2642-9381
-
tuw.booktitle
2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
-
tuw.peerreviewed
true
-
tuw.relation.publisher
IEEE
-
tuw.researchTopic.id
I5
-
tuw.researchTopic.name
Visual Computing and Human-Centered Technology
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E193-01 - Forschungsbereich Computer Vision
-
tuw.publisher.doi
10.1109/WACV61041.2025.00091
-
dc.description.numberOfPages
11
-
tuw.author.orcid
0009-0004-6524-0715
-
tuw.author.orcid
0000-0002-7857-5512
-
tuw.event.name
2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025)
en
tuw.event.startdate
26-02-2025
-
tuw.event.enddate
06-03-2025
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Tucson, Arizona
-
tuw.event.country
US
-
tuw.event.presenter
Sick, Leon
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Mathematik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
1010
-
wb.sciencebranch.value
90
-
wb.sciencebranch.value
10
-
item.openairetype
conference paper
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.cerifentitytype
Publications
-
item.languageiso639-1
en
-
item.grantfulltext
none
-
item.fulltext
no Fulltext
-
crisitem.author.dept
Universität Ulm, Germany
-
crisitem.author.dept
Universität Ulm, Germany
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.dept
Universität Ulm
-
crisitem.author.orcid
0009-0004-6524-0715
-
crisitem.author.orcid
0000-0002-7857-5512
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology