Bauer, D., Patten, T. M., & Vincze, M. (2022). Visual and Physical Plausibility of Object Poses for Robotic Scene Understanding. In S. T. Köszegi & M. Vincze (Eds.), Trust in Robots (pp. 81–103). TU Wien Academic Press. https://doi.org/10.34727/2022/isbn.978-3-85448-052-5_4
Humans use the relations between objects in a scene to determine how they may interact with, grasp and manipulate
them. For robots, such an object-based scene understanding not only allows interaction with objects but also
allows humans to interpret the robot’s perception and actions. To gain a higher-level understanding of an observed
scene, knowledge of the objects’ poses is crucial. The poses, when combined with 3D models of the objects, allow
for easy derivation of the interactions between objects, enabling reasoning about occlusion, collisions, support and,
finally, manipulation by the robot. However, most related work does not consider scene-level object interactions
but rather focuses on finding the pose of a single object in a given frame. Object interactions are considered only
to augment training data or in post hoc verification steps. In contrast, we show that such scene-level information
should be exploited during the estimation of the object poses themselves. Our main assumption is that all object
hypotheses need to be plausible in terms of their visual observation and the physical scene in which they exist. In
this chapter, we present our work on investigating the exploitation of this visual and physical plausibility for robust,
accurate estimation and understandable explanation of object poses.