Fäulhammer, T. (2017). From the lab to the wild: learning and recognizing objects in cluttered environments on a mobile robot [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2017.45560
E376 - Institut für Automatisierungs- und Regelungstechnik
-
Date (published):
2017
-
Number of Pages:
125
-
Keywords:
Roboter; Objekterkennung; Mehrfache Ansichten
de
Object recognition; Multiple views; Robotics
en
Abstract:
Objects are an essential part of the environment and their detection is a key-element for robotics and related elds. This thesis presents a system that allows a mobile robot to autonomously detect, model and re-recognize objects in everyday environments. While existing work have demonstrated individual components, we present the rst complete system that is fully capable without human intervention for regular indoor environments. Complementary to oine learned object models, the robot detects objects to be learned by modeling the static parts of the environment and extracting dynamic elements. A view plan around a dynamic element is then created and executed to gather additional data from dierent views for learning. Finally, the observations are fused into a 3D object model, which is used as training data for recognition. The recognition framework presented in this thesis provides several solutions to overcome common challenges in real-world environments such as clutter, occlusion, non-textured objects and noisy data. The recognizer uses local as well as global features that characterize both visual appearance and shape, generating many possibly competing hypotheses, which are then veri ed such that the observed scene can be optimally explained in terms of recognized object models. By taking advantage of a mobile camera platform, which allows bene cial vantage points in the environment to be explored, we present an ecient online multi-view recognizer that improves recognition performance by fusing information from multiple individual views of the environment. Compared to existing multi-view approaches, the proposed method improves computational eciency, makes fewer assumptions, can operate in partially dynamic and cluttered environments, and allows a wider range of dierent object types to be detected. The performance of the system is evaluated on publicly available RGB-D datasets as well as with data collected by a robot in controlled and uncontrolled scenarios. To foster future research in this direction, both data and code for the whole framework are made publicly available through the Vision for Robotics (V4R) library.