Visually and physically plausible object pose estimation for robot vision

Bauer, Dominik

doi:10.34726/hss.2022.100360

DC Field

Value

Language

dc.contributor.advisor

Vincze, Markus

dc.contributor.author

Bauer, Dominik

dc.date.accessioned

2022-03-11T10:06:40Z

dc.date.issued

2021

dc.date.submitted

2022-03

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Bauer, D. (2021). <i>Visually and physically plausible object pose estimation for robot vision</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.100360</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2022.100360

dc.identifier.uri

http://hdl.handle.net/20.500.12708/19726

dc.description

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

dc.description.abstract

Autonomous robots are expected to reliably interact with their environment, following user commands and manipulating objects. This requires a robot to understand its environment, to determine the objects of which it is composed and how they relate to each other. Using object pose estimation, the robot may determine the 3D translation and 3D rotation of known object models with respect to its observation of the environment. Given the pose of all observed objects, the robot may create a 3D representation of the scene, consisting of the objects’models and the spatial relations between them. Such an understanding allows the robot to, for example, reason about interactions with individual objects, synthesize novel views of the scene or interpret users’ commands. However, the alignment of object models to the robot’s visual observation may suffer from sensor noise, partial observability and object symmetry that lead to ambiguous situations and inaccurate poses. The resulting representation of the scene may thus contain implausibilities such as intersecting, floating or statically unstableobjects. Resorting to physical relations alone also suffers from ambiguity as there are, for example, numerous possibilities for two objects to plausibly interact. Accounting for such scene-level consistency is further complicated by multiple, potentially inaccurate hypotheses per object that create a complex search space for resolving conflicting pose hypotheses.To overcome these ambiguities and to resolve scene-level inconsistencies, we hypothesize that visual and physical plausibility complement each other and allow for more accurate and robust object pose estimation. We conjecture that the complexity of dealing with scenes of multiple objects with multiple hypotheses each may be tamed by considering the plausibility of the resulting configurations. While we argue that such reasoning may be generally beneficial in robot vision, we focus on the task of object pose estimation and its sub-steps of refinement and verification. In this thesis, we provide definitions for visual and physical plausibility of object poses in static scenes. Visual plausibility is considered as rendering- or pointcloud-based alignment. Physical plausibility is determined by simulation or evaluation of static equilibrium. We propose analytical and learning-based approaches to the object pose estimation task that leverage these definitions. We explore concepts from reinforcement learning to incorporate plausibility at different stages of the pose estimation pipeline and to efficiently consider vast numbers scene-level combinations. Moreover, based on the plausibility information gathered by our proposed methods, we derive explanation strategies for human-robot interaction in case of robotic failure. By evaluation on common datasets and by applying our methods to robotic grasping, we highlight the accuracy, robustness and efficiency of our proposed object pose estimation approaches and demonstrate the benefit of considering visual and physical plausibility for this task.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

3D Vision

dc.subject

Pose estimation

dc.subject

Object recognition

dc.subject

robotics

dc.subject

Maschinelles Lernen

dc.subject

3D Sehen

dc.subject

Objekterkennung

dc.subject

Posebestimmung

dc.subject

Roboter

dc.title

Visually and physically plausible object pose estimation for robot vision

dc.title.alternative

Visuelle und physikalische Plausibilität von Objektposen für Robotersehen

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2022.100360

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Dominik Bauer

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Patten, Timothy Michael

tuw.publication.orgunit

E376 - Institut für Automatisierungs- und Regelungstechnik

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC16465091

dc.description.numberOfPages

104

dc.thesistype

Dissertation

dc.thesistype

Dissertation

tuw.author.orcid

0000-0002-1260-1319

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

item.languageiso639-1

item.openairetype

doctoral thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.openaccessfulltext

Open Access

crisitem.author.dept

E376-02 - Forschungsbereich Komplexe Dynamische Systeme

crisitem.author.orcid

0000-0002-1260-1319

crisitem.author.parentorg

E376 - Institut für Automatisierungs- und Regelungstechnik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(4.44 MB)

In Copyright

Show simple item record

Page view(s)

369

checked on Nov 22, 2023

Download(s)

178

checked on Nov 22, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM