Wohlkinger, W. (2012). Grasping categories : 3D shape matching with synthetic 3D data for robotic manipulation of categories [Dissertation, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/161118
Robots are on the verge of entering the users home environment, creating the immediate need for a flexible and scalable approach to semantic perception for cognitive robotics. The domestic setting, with its plethora of new objects and inherent intra-class variety, poses great challenges to object class recognition systems. Robotic manipulation on classes of objects - "a mug" - demands great generalization skills from a service robot's object classification system. These systems need to be trained on large data collection from multiple viewpoints and this causes training to become bottlenecked. One way to approach these challenges is to enable robots access to the Internet. This would allow for the sharing of information, accessible knowledge repositories and the use of freely available 3D data for training. As a result, a common knowledge base for all robots would be established, independent of their physical embodiments. This thesis proposes Shape based 3D Object Classification from Synthetic Training Data as a novel, scalable and flexible approach for robotic systems to be able to easily learn to recognize categories of objects, link the categories to semantic meaning and utilize this for fast and intelligent manipulation of objects and object categories. The thesis contributes in three distinct ways to the research area of semantic perception and manipulation for robotics. The first contribution is Training on Synthetic 3D CAD Models, a new paradigm to train object classification and object recognition algorithms purely on synthetic 3D models. This novel approach allows for an efficient way of collecting, organizing and preparing training data for hundreds of categories. The 3D models, in the order of several thousands, allows the challenge of huge intraclass variabilities to be tackled. Training on synthetic 3D CAD models also allows for efficient training and retraining, regardless of the sensor and shape descriptor at hand and also creates a common knowledge pool, which is easy to share, maintain and extend. The 3D model collection, with its hierarchical organization according to WordNet, and semantic linking, together with the proposed approach of generating training data for 3D shape based recognition systems, presents an efficient, flexible and scalable way of providing robots semantic perception, which is open to the robotics community at 3D-Net.org. The second contribution is 3D Shape Matching with Shape Distributions on Voxel Surfaces, which presents a set of global 3D shape descriptors together with matching schemes for efficient and scalable 3D object classification on point cloud data. The presented descriptors distinguish themselves from alternative state-of-the-art descriptors in that they are not based on surface normals, outperform state-of-the-art descriptors, are fast to compute, can cope well with sensor noise and missing data and scale well to larger number of classes of shapes. Utilizing these descriptors with the presented matching schemes of nearest neighbor search, hashing and random forests, allows their flexibility, in cooperation with the 3d model database, to be shown for various applications. The third and last contribution of this thesis is the Scale Oblivious Grasp Planner for robotic grasping and manipulation of categories. The proposed approach is able to calculate grasps for a variety of robotic grippers on 3D models of unit scale. A robotic tool chain is presented to utilize these offline calculated grasp points with the 3D-Net database to categorize objects, align the 3D model and transfer the grasp points into real world for successful, fast and proper semantically grasping of categories. The contributions presented in this thesis have been implemented, tested and em- pirically evaluated on different robotic platforms, sensors and environments. The three contributions led to an intelligent robotic system being able to semantically perceive objects and grasp similar objects as object categories, which has previously never been done. The synthetic model based object classification system, with the shape distribution descriptor, outperforms comparable systems, is able to work on different sensors and provides real time classification. The 3D-Net database and framework appears to be flexible and modular enough to additionally include semantics and contextual information. Together with the proposed grasping approach, it is able to successfully perform robotic grasping of categories on four different robotic systems.