Jing, X., Qian, K., & Vincze, M. (2025). CAGT: Sim-to-Real Depth Completion with Interactive Embedding Aggregation and Geometry Awareness for Transparent Objects. IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.34726/9499
Robust depth completion of transparent objects would be beneficial for industrial automation such as vision-based robotic grasping and manipulation. However, although some methods try to learn a compact intra-layer feature representation with the boost of the attention mechanism or the vision Transformer, they ignore the neglected corner regions and sparse geometry information that are important for accurate depth completion. To tackle these issues, we propose a novel sim-to-real transferable model, named CAGT, with interactive embedding aggregation and geometry awareness to reconstruct severely sparse depth maps of transparent objects in this paper. We design a Depth-clue Interaction Aggregation Module (DIAM) to enhance the Transformer's ability to extract boundary corner features and thus supplement depth clues. Then, we propose a Geometric Information Augmentation Module (GIAM) to fuse the geometry-aware feature containing shape and surface details. Moreover, we introduce a contrastive learning mechanism to facilitate the sim-to-real generalization of the completion model. Extensive experiment results on two challenging datasets, ClearGrasp and TransCG, demonstrate that our proposed CAGT can obtain superior performance over the state-of-the-art methods. We also demonstrate that CAGT can improve the grasp accuracy of transparent objects by a robotic grasping generalization experiment.