<div class="csl-bib-body">
<div class="csl-entry">Kriegler, A., Beleznai, C., Gelautz, M., Murschitz, M., & Göbel, K. (2023). PrimitivePose: Generic Model and Representation for 3D Bounding Box Prediction of Unseen Objects. <i>International Journal of Semantic Computing</i>, <i>17</i>(03), 387–410. https://doi.org/10.1142/S1793351X23620027</div>
</div>
-
dc.identifier.issn
1793-351X
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/189539
-
dc.description
Special Issue on IEEE IRC 2022
-
dc.description.abstract
A considerable amount of research is concerned with the challenging task of estimating three-dimensional (3D) pose and size for multi-object indoor scene configurations. Many existing models rely on a priori known object models, such as 3D CAD models and are therefore limited to a predefined set of object categories. This closed-set constraint limits the range of applications for robots interacting in dynamic environments where previously unseen objects may appear. This paper addresses this problem with a highly generic 3D bounding box detection method that relies entirely on geometric cues obtained from depth data percepts. While the generation of synthetic data, e.g. synthetic depth maps, is commonly used for this task, the well-known synth-to-real gap often emerges, which prohibits transition of models trained solely on synthetic data to the real world. To ameliorate this problem, we use stereo depth computation on synthetic data to obtain pseudo-realistic disparity maps. We then propose an intermediate representation, namely disparity-scaled surface normal (SN) images, which encodes geometry and at the same time preserves depth/scale information unlike the commonly used standard SNs. In a series of experiments, we demonstrate the usefulness of our approach, detecting everyday objects on a captured data set of tabletop scenes, and compare it to the popular PoseCNN model. We quantitatively show that standard SNs are less adequate for challenging 3D detection tasks by comparing predictions from the model trained on disparity alone, SNs and disparity-scaled SNs. Additionally, in an ablation study we investigate the minimal number of training samples required for such a learning task. Lastly, we make the tool used for 3D object annotation publicly available at: https://preview.tinyurl.com/3ycn8v5k. A video showcasing our results can be found at: https://preview.tinyurl.com/dzdzabek.
en
dc.language.iso
en
-
dc.publisher
World Scientific Publishing Co Pte Ltd
-
dc.relation.ispartof
International Journal of Semantic Computing
-
dc.subject
3D bounding box prediction
en
dc.subject
Unseen objects
en
dc.subject
Surface normals
en
dc.subject
Synthetic data
en
dc.subject
Geometric primitives
en
dc.subject
Object pose annotation
en
dc.title
PrimitivePose: Generic Model and Representation for 3D Bounding Box Prediction of Unseen Objects
en
dc.type
Article
en
dc.type
Artikel
de
dc.contributor.affiliation
Austrian Institute of Technology, Austria
-
dc.contributor.affiliation
Austrian Institute of Technology, Austria
-
dc.contributor.affiliation
Austrian Institute of Technology, Austria
-
dc.description.startpage
387
-
dc.description.endpage
410
-
dcterms.dateSubmitted
2023-10-29
-
dc.type.category
Original Research Article
-
tuw.container.volume
17
-
tuw.container.issue
03
-
tuw.journal.peerreviewed
true
-
tuw.peerreviewed
true
-
tuw.publication.invited
invited
-
tuw.researchTopic.id
I5
-
tuw.researchTopic.name
Visual Computing and Human-Centered Technology
-
tuw.researchTopic.value
100
-
dcterms.isPartOf.title
International Journal of Semantic Computing
-
tuw.publication.orgunit
E193-01 - Forschungsbereich Computer Vision
-
tuw.publisher.doi
10.1142/S1793351X23620027
-
dc.date.onlinefirst
2023-08-09
-
dc.identifier.eissn
1793-7108
-
dc.description.numberOfPages
24
-
tuw.author.orcid
0000-0002-9476-0865
-
wb.sciencebranch
Informatik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.value
100
-
item.openairecristype
http://purl.org/coar/resource_type/c_2df8fbb1
-
item.languageiso639-1
en
-
item.fulltext
no Fulltext
-
item.grantfulltext
restricted
-
item.openairetype
research article
-
item.cerifentitytype
Publications
-
crisitem.author.dept
Austrian Institute of Technology
-
crisitem.author.dept
E193-01 - Forschungsbereich Computer Vision
-
crisitem.author.dept
Austrian Institute of Technology
-
crisitem.author.dept
Austrian Institute of Technology
-
crisitem.author.orcid
0000-0002-9476-0865
-
crisitem.author.parentorg
E193 - Institut für Visual Computing and Human-Centered Technology