Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

Thalhammer, Stefan; Jean-Baptiste Weibel; Markus Vincze; Rodriguez-Garcia, Jose

doi:10.1016/j.imavis.2023.104816

DC Field

Value

Language

dc.contributor.author

Thalhammer, Stefan

dc.contributor.author

Jean-Baptiste Weibel

dc.contributor.author

Markus Vincze

dc.contributor.author

Rodriguez-Garcia, Jose

dc.date.accessioned

2023-12-28T08:12:42Z

dc.date.available

2023-12-28T08:12:42Z

dc.date.issued

2023-11

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Thalhammer, S., Jean-Baptiste Weibel, Markus Vincze, & Rodriguez-Garcia, J. (2023). Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects. <i>Image and Vision Computing</i>, <i>139</i>, Article 104816. https://doi.org/10.1016/j.imavis.2023.104816</div> </div>

dc.identifier.issn

0262-8856

dc.identifier.uri

http://hdl.handle.net/20.500.12708/190773

dc.description.abstract

Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is, objects unseen during training. Such works use deep template matching strategies to retrieve the closest template connected to a query image, which implicitly provides object class and pose. Despite the recent success and improvements of Vision Transformers over CNNs for many vision tasks, the state of the art uses CNN-based approaches for novel object pose estimation. This work evaluates and demonstrates the differences between self-supervised CNNs and Vision Transformers for deep template matching. In detail, both types of approaches are trained using contrastive learning to match training images against rendered templates of isolated objects. At test time such templates are matched against query images of known and novel objects under challenging settings, such as clutter, occlusion and object symmetries, using masked cosine similarity. The presented results not only demonstrate that Vision Transformers improve matching accuracy over CNNs but also that for some cases pre-trained Vision Transformers do not need fine-tuning to achieve the improvement. Furthermore, we highlight the differences in optimization and network architecture when comparing these two types of networks for deep template matching.

dc.language.iso

dc.publisher

ELSEVIER

dc.relation.ispartof

Image and Vision Computing

dc.subject

Object recognition

dc.title

Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

dc.type

Article

dc.type

Artikel

dc.contributor.affiliation

University of Alicante, Spain

dc.type.category

Original Research Article

tuw.container.volume

139

tuw.journal.peerreviewed

true

tuw.peerreviewed

true

wb.publication.intCoWork

International Co-publication

tuw.researchTopic.id

tuw.researchTopic.name

Automation and Robotics

tuw.researchTopic.value

100

dcterms.isPartOf.title

Image and Vision Computing

tuw.publication.orgunit

E376-02 - Forschungsbereich Komplexe Dynamische Systeme

tuw.publisher.doi

10.1016/j.imavis.2023.104816

dc.date.onlinefirst

2023-09-12

dc.identifier.articleid

104816

dc.identifier.eissn

1872-8138

dc.description.numberOfPages

tuw.author.orcid

0000-0003-0201-4740

tuw.author.orcid

0000-0003-1803-2919

wb.sci

true

wb.sciencebranch

Elektrotechnik, Elektronik, Informationstechnik

wb.sciencebranch.oefos

2020

wb.sciencebranch.value

100

item.openairecristype

http://purl.org/coar/resource_type/c_2df8fbb1

item.openairetype

research article

item.fulltext

no Fulltext

item.languageiso639-1

item.grantfulltext

none

item.cerifentitytype

Publications

crisitem.author.dept

E376-02 - Forschungsbereich Komplexe Dynamische Systeme

crisitem.author.dept

E376-02 - Forschungsbereich Komplexe Dynamische Systeme

crisitem.author.dept

E376-02 - Forschungsbereich Komplexe Dynamische Systeme

crisitem.author.dept

University of Alicante

crisitem.author.orcid

0000-0003-0201-4740

crisitem.author.orcid

0000-0003-1803-2919

crisitem.author.parentorg

E376 - Institut für Automatisierungs- und Regelungstechnik

crisitem.author.parentorg

E376 - Institut für Automatisierungs- und Regelungstechnik

crisitem.author.parentorg

E376 - Institut für Automatisierungs- und Regelungstechnik

Appears in Collections:

Article

Show simple item record

Page view(s)

checked on Dec 29, 2023

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM