Loghmani, M. R. (2020). Object classification for robot vision through RGB-D recognition and domain adaptation [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2020.80401
E376 - Institut für Automatisierungs- und Regelungstechnik
-
Date (published):
2020
-
Number of Pages:
95
-
Keywords:
Robot; computer Vision; recognition; real-world
en
Abstract:
Object recognition, or object classification, is an essential skill for robot visual perception systems since it constitutes the foundation for higher-level tasks like object detection, pose estimation and manipulation. Nonetheless, recognizing objects in unconstrained environments remains arduous with robots facing challenges such as intra-class variation, occlusion, clutter, view point variation, and changes in light and scale. Deep convolutional neural networks (CNNs) have revolutionized object classification and computer vision as a whole. However, standard computer vision benchmarks often fail to address all the challenges of robot vision. This results in the development of classification models that perform poorly when deployed on a robot in-the-wild. In this thesis, we perform a systematic study of object recognition for robot vision and propose algorithmic innovations that tackle different aspects of this multifaceted problem. We first collect a robot-centric dataset called autonomous robot indoor dataset and test the performance of well-known CNN architectures on it. This evaluation indicates two main lines of research for more reliable and robust object recognition: (i) the integration of geometric information as depth data with the standard RGB data, and (ii) the use of domain adaptation to bridge the gap between the training (source) data and the real (target) data the robot encounters. To combine RGB and depth data, we propose recurrent convolutional fusion: a novel architecture that extracts features from different layers of a two-stream CNN and combines them using a recurrent neural network. To perform domain adaptation on RGB-D data, we propose a multi-task learning method that, in addition to the standard recognition task, learns to predict the relative rotation between the RGB and depth image of a sample. We go one step further and consider the more realistic problem of open set domain adaptation (OSDA), that requires to adapt two domains when the target contains not only the known classes of the source, but also unknown classes. We propose positive-unlabeled reconstruction encoding, an algorithm that uses the theoretical framework of positive-unlabeled learning and a novel loss based on sample reconstruction to recognize the unknown classes of the target. We further improve upon this algorithm by proposing rotation-based openset that performs both the adaptation and the known/unknown recognition using the self-supervised task of relative rotation. Extensive quantitative and qualitative experiments on standard benchmarks and newly collected datasets empirically validate our algorithmic contributions. These methods push the state of the art in RGB-D object recognition and domain adaptation and brings us closer to build robotic systems with human-like recognition performance.
en
Additional information:
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers