<div class="csl-bib-body">
<div class="csl-entry">Bauer, D. (2021). <i>Visually and physically plausible object pose estimation for robot vision</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.100360</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2022.100360
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/19726
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
Autonomous robots are expected to reliably interact with their environment, following user commands and manipulating objects. This requires a robot to understand its environment, to determine the objects of which it is composed and how they relate to each other. Using object pose estimation, the robot may determine the 3D translation and 3D rotation of known object models with respect to its observation of the environment. Given the pose of all observed objects, the robot may create a 3D representation of the scene, consisting of the objects’models and the spatial relations between them. Such an understanding allows the robot to, for example, reason about interactions with individual objects, synthesize novel views of the scene or interpret users’ commands. However, the alignment of object models to the robot’s visual observation may suffer from sensor noise, partial observability and object symmetry that lead to ambiguous situations and inaccurate poses. The resulting representation of the scene may thus contain implausibilities such as intersecting, floating or statically unstableobjects. Resorting to physical relations alone also suffers from ambiguity as there are, for example, numerous possibilities for two objects to plausibly interact. Accounting for such scene-level consistency is further complicated by multiple, potentially inaccurate hypotheses per object that create a complex search space for resolving conflicting pose hypotheses.To overcome these ambiguities and to resolve scene-level inconsistencies, we hypothesize that visual and physical plausibility complement each other and allow for more accurate and robust object pose estimation. We conjecture that the complexity of dealing with scenes of multiple objects with multiple hypotheses each may be tamed by considering the plausibility of the resulting configurations. While we argue that such reasoning may be generally beneficial in robot vision, we focus on the task of object pose estimation and its sub-steps of refinement and verification. In this thesis, we provide definitions for visual and physical plausibility of object poses in static scenes. Visual plausibility is considered as rendering- or pointcloud-based alignment. Physical plausibility is determined by simulation or evaluation of static equilibrium. We propose analytical and learning-based approaches to the object pose estimation task that leverage these definitions. We explore concepts from reinforcement learning to incorporate plausibility at different stages of the pose estimation pipeline and to efficiently consider vast numbers scene-level combinations. Moreover, based on the plausibility information gathered by our proposed methods, we derive explanation strategies for human-robot interaction in case of robotic failure. By evaluation on common datasets and by applying our methods to robotic grasping, we highlight the accuracy, robustness and efficiency of our proposed object pose estimation approaches and demonstrate the benefit of considering visual and physical plausibility for this task.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
3D Vision
en
dc.subject
Pose estimation
en
dc.subject
Object recognition
en
dc.subject
robotics
en
dc.subject
Maschinelles Lernen
de
dc.subject
3D Sehen
de
dc.subject
Objekterkennung
de
dc.subject
Posebestimmung
de
dc.subject
Roboter
de
dc.title
Visually and physically plausible object pose estimation for robot vision
en
dc.title.alternative
Visuelle und physikalische Plausibilität von Objektposen für Robotersehen
de
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2022.100360
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Dominik Bauer
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
dc.contributor.assistant
Patten, Timothy Michael
-
tuw.publication.orgunit
E376 - Institut für Automatisierungs- und Regelungstechnik