Thalhammer, S. (2022). Simultaneous object detection and pose estimation under domain shift [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.107061
Robotic systems are meant to provide support to humans, be it in their homes or workplaces. Manipulating objects is one of the most fundamental tasks for such systems,since this enables robots to autonomously take over tasks from humans. Relevant applications are diverse, ranging from making predictions for mobile manipulation, binpicking and augmented reality. Using real-world data for training is not desired since estimators are biased towards the data characteristics, capturing and annotating data is cumbersome and data in the target domain is not always available. Thus, using synthetic data is preferable. However, this requires that algorithms for object localisation need to generalise to novel domains. Another challenge is handling Relevant applications require object pose estimators to handle multiple distinct objects. Efficiently handling involved objects keeps cycle times short and computational system load low. The challenge is to effectively encode feature space for object sets with different shape complexities and symmetries.This thesis presents methods to adapt, respectively generalise to novel domains and formulations for object pose estimators that handle multiple objects with different scale swell with respect to the number of object instances in the image. Generalising to novel domains requires unbiased estimators. Rendering training data allows to randomise the relevant data characteristics, creating unbiased data well suited for training models meant to generalise well. Solutions for adapting synthetic depth data to the real world domain and for generalising in the RGB-domain are presented. We formulate object pose estimation as a multi-task problem, performing detection, classification and pose correspondence estimation simultaneously for multiple objects. This approach is extended with direct pose regression resulting in scalable object pose estimation with constant runtime with respect to the number of object instances in the image.Evaluations are provided on five datasets and on two different grasping scenarios.Presented experiments indicate that synthetic training data is well suited for learningbased object localisation. Training the presented object pose estimators using thedomain adaptation for the depth domain and our domain generalisation strategies for the RGB domain results in competitive performance compared to the state of the art.The direct pose regression extension for scalable object pose estimation improves overother single-staged approaches and results in negligible runtime increase for up to 90object instances in an image. We present grasping experiments showing the suitability of the presented methods for real-world deployment.
en
Additional information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers