PrimitivePose: Generic Model and Representation for 3D Bounding Box Prediction of Unseen Objects

Kriegler, Andreas; Beleznai, Csaba; Gelautz, Margrit; Murschitz, Markus; Göbel, Kai

doi:10.1142/S1793351X23620027

DC Field

Value

Language

dc.contributor.author

Kriegler, Andreas

dc.contributor.author

Beleznai, Csaba

dc.contributor.author

Gelautz, Margrit

dc.contributor.author

Murschitz, Markus

dc.contributor.author

Göbel, Kai

dc.date.accessioned

2023-11-09T14:10:18Z

dc.date.available

2023-11-09T14:10:18Z

dc.date.issued

2023-09

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Kriegler, A., Beleznai, C., Gelautz, M., Murschitz, M., & Göbel, K. (2023). PrimitivePose: Generic Model and Representation for 3D Bounding Box Prediction of Unseen Objects. <i>International Journal of Semantic Computing</i>, <i>17</i>(03), 387–410. https://doi.org/10.1142/S1793351X23620027</div> </div>

dc.identifier.issn

1793-351X

dc.identifier.uri

http://hdl.handle.net/20.500.12708/189539

dc.description

Special Issue on IEEE IRC 2022

dc.description.abstract

A considerable amount of research is concerned with the challenging task of estimating three-dimensional (3D) pose and size for multi-object indoor scene configurations. Many existing models rely on a priori known object models, such as 3D CAD models and are therefore limited to a predefined set of object categories. This closed-set constraint limits the range of applications for robots interacting in dynamic environments where previously unseen objects may appear. This paper addresses this problem with a highly generic 3D bounding box detection method that relies entirely on geometric cues obtained from depth data percepts. While the generation of synthetic data, e.g. synthetic depth maps, is commonly used for this task, the well-known synth-to-real gap often emerges, which prohibits transition of models trained solely on synthetic data to the real world. To ameliorate this problem, we use stereo depth computation on synthetic data to obtain pseudo-realistic disparity maps. We then propose an intermediate representation, namely disparity-scaled surface normal (SN) images, which encodes geometry and at the same time preserves depth/scale information unlike the commonly used standard SNs. In a series of experiments, we demonstrate the usefulness of our approach, detecting everyday objects on a captured data set of tabletop scenes, and compare it to the popular PoseCNN model. We quantitatively show that standard SNs are less adequate for challenging 3D detection tasks by comparing predictions from the model trained on disparity alone, SNs and disparity-scaled SNs. Additionally, in an ablation study we investigate the minimal number of training samples required for such a learning task. Lastly, we make the tool used for 3D object annotation publicly available at: https://preview.tinyurl.com/3ycn8v5k. A video showcasing our results can be found at: https://preview.tinyurl.com/dzdzabek.

dc.language.iso

dc.publisher

World Scientific Publishing Co Pte Ltd

dc.relation.ispartof

International Journal of Semantic Computing

dc.subject

3D bounding box prediction

dc.subject

Unseen objects

dc.subject

Surface normals

dc.subject

Synthetic data

dc.subject

Geometric primitives

dc.subject

Object pose annotation

dc.title

PrimitivePose: Generic Model and Representation for 3D Bounding Box Prediction of Unseen Objects

dc.type

Article

dc.type

Artikel

dc.contributor.affiliation

Austrian Institute of Technology, Austria

dc.contributor.affiliation

Austrian Institute of Technology, Austria

dc.contributor.affiliation

Austrian Institute of Technology, Austria

dc.description.startpage

387

dc.description.endpage

410

dcterms.dateSubmitted

2023-10-29

dc.type.category

Original Research Article

tuw.container.volume

tuw.container.issue

tuw.journal.peerreviewed

true

tuw.peerreviewed

true

tuw.publication.invited

invited

tuw.researchTopic.id

tuw.researchTopic.name

Visual Computing and Human-Centered Technology

tuw.researchTopic.value

100

dcterms.isPartOf.title

International Journal of Semantic Computing

tuw.publication.orgunit

E193-01 - Forschungsbereich Computer Vision

tuw.publisher.doi

10.1142/S1793351X23620027

dc.date.onlinefirst

2023-08-09

dc.identifier.eissn

1793-7108

dc.description.numberOfPages

tuw.author.orcid

0000-0002-9476-0865

wb.sciencebranch

Informatik

wb.sciencebranch.oefos

1020

wb.sciencebranch.value

100

item.languageiso639-1

item.openairetype

research article

item.openairecristype

http://purl.org/coar/resource_type/c_2df8fbb1

item.grantfulltext

restricted

item.cerifentitytype

Publications

item.fulltext

no Fulltext

crisitem.author.dept

Austrian Institute of Technology

crisitem.author.dept

E193-01 - Forschungsbereich Computer Vision

crisitem.author.dept

Austrian Institute of Technology

crisitem.author.dept

Austrian Institute of Technology

crisitem.author.orcid

0000-0002-9476-0865

crisitem.author.parentorg

E193 - Institut für Visual Computing and Human-Centered Technology

Appears in Collections:

Article

Show simple item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM