E376 - Institut für Automatisierungs- und Regelungstechnik
-
Datum (veröffentlicht):
2025
-
Umfang:
68
-
Keywords:
Roboter; Computer Vision; Objekterkennung; Deep Learning; Pose Bestimmung
de
Robots; Computer Vision; Object Detection; Deep Learning; Pose Estimation
en
Abstract:
Precisely estimating an object’s pose represents a fundamental component in many applications utilizing computer vision, including those within industrial robotics: the textureless surface and high reflectivity of metallic objects present pose estimation challenges. The objective of this master’s thesis is to develop a method that enables the robust and accurate estimation of 6DoF poses for metallic and reflective objects in an industrial context.This thesis builds on the most recent findings in this domain and employs a methodology incorporating contour-based object representation. The method comprises three principal components: a network for object detection and segmentation, a diffusion model for edge detection, and a newly developed network for estimating object poses from edge images. Furthermore, this research entails the creation of datasets that facilitate the training of the networks mentioned above. In this context, a novel rendering pipeline will be developed within the framework of this study, aimed at generating photorealistic training images alongside corresponding ground-truth edge images. The functionality of this pipelineis based on the rendering of realistic textures and illumination conditions, which allows the training data to be adapted to reflect the actual challenges.The proposed method, called Edge2Pose, involves the detection of the target object by utilizing a YOLOv8 segmentation model. Subsequently, the DiffusionEdge network is employed to detect edges extracted from the scene by the specified region of interest. The edge images are transmitted to the network for pose estimation, which predicts the 3Dcoordinates based on the edges depicted in the images. This process is analogous to CDPN,Pix2Pose, and DPOD methods. Initially, the 3D coordinates of the model are transformed into RGB values and subsequently predicted by the network. The ultimate pose estimationis achieved by establishing 2D-3D correspondences, which are then processed using the PnP/RANSAC algorithm.The results of the experiments conducted with diverse data sets (RT-Less, T-Less, andMP-6D) illustrate that the employed methodology is a practical approach for estimating the poses of metallic and reflective objects. Furthermore, this methodology provides considerable advantages in scenarios where the camera consistently focuses on the scene, such as pick-and-place operations.
en
Weitere Information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers