Unraveled Skeletons DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieur im Rahmen des Studiums Visual Computing eingereicht von Maximilian Langer, BSc Matrikelnummer 01226991 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: O.Univ.Prof. Dipl.-Ing. Dr.techn. Walter G. Kropatsch Wien, 29. April 2019 Maximilian Langer Walter G. Kropatsch Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Die approbierte Originalversion dieser Diplom-/ Masterarbeit ist in der Hauptbibliothek der Tech- nischen Universität Wien aufgestellt und zugänglich. http://www.ub.tuwien.ac.at The approved original version of this diploma or master thesis is available at the main library of the Vienna University of Technology. http://www.ub.tuwien.ac.at/eng Unraveled Skeletons DIPLOMA THESIS submitted in partial fulfillment of the requirements for the degree of Diplom-Ingenieur in Visual Computing by Maximilian Langer, BSc Registration Number 01226991 to the Faculty of Informatics at the TU Wien Advisor: O.Univ.Prof. Dipl.-Ing. Dr.techn. Walter G. Kropatsch Vienna, 29th April, 2019 Maximilian Langer Walter G. Kropatsch Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Erklärung zur Verfassung der Arbeit Maximilian Langer, BSc Alois Pummer-Gasse 7, 2345 Brunn am Gebirge Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 29. April 2019 Maximilian Langer v Acknowledgements My great thanks goes to Professor Walter Kropatsch for his excellent supervision and to all people of PRIP I had inspiring talks with, namely Daniel Pucher and Ines Janusch. I also want to thank my family for supporting me throughout my studies. vii Kurzfassung Diese Diplomarbeit erarbeitet eine neue Repräsentation für zweidimensionale Objekte, genannt Unraveled Skeletons. Sie ist dafür konstruiert robust in Bezug zu Transformatio- nen zu sein, im speziellen gegenüber artikulierten Bewegungen. Die Eigenschaften der Repräsentation wurden studiert und mögliche Applikationen analysiert. Die Repräsentation basiert auf dem Skelett oder Medial Axis Transform des Objekts. Sie benutzt die sehr robuste Voronoi Methode, um das Skelett zu generieren und „läuft“ dann entlang des Skeletts um das Skelett herum. Bei jedem Punkt wird die minimale Distanz zum Umriss des Objekts in einen Vektor gespeichert. Das Resultat ist eine Liste (Shape signature). Diese Liste (Unraveled Skeleton Vector) kann anschließend für weitere Bearbeitung, zum Beispiel Normalisierung, herangezogen werden. Diese Arbeit untersucht zwei mögliche Einsatzgebiete für Unraveled Skeletons: Shape Analysis und Posen-unabhängige Koordinatensysteme. Da das Unraveled Skeleton unab- hängig gegenüber artikulierten Bewegungen ist, ist es geeignet für Shape Analysis oder Objekterkennung, wenn sich diese Objekte artikuliert bewegen (zum Beispiel Tiere). Ver- schiedene Normalisierungs- und Optimierungs-Strategien wurden erarbeitet. Außerdem kann das Unraveled Skeleton dazu benutzt werden, Teile von Objekten zu erkennen. Das Posen-unabhängige Koordinatensystem weist jedem Punkt im Objekt und auf seinem Umriss eine eindeutige Koordinate zu, unabhängig von der Pose oder artikulierten Bewegung. Gemeinsam mit dem nächsten Punkt am Skelett und der Distanz wie sie vom Unraveled Skeleton bereitgestellt werden, können konvexe Regionen adressiert werden. Mit einer dritten Zahl kann jeder Punkt dieser Regionen eindeutig bestimmt werden, die Unraveled Skeleton Koordinaten bestehen daher aus 3 Komponenten. Diese Arbeit beschäftigt sich mit drei möglichen Anwendungen für diese Koordinaten: Shape Blending, Super-Resolution und Verbesserung von Segmentationen. Non-centered Skelette werden ebenfalls behandelt. Alle Resultate wurden auf mehreren Datensätzen evaluiert. ix Abstract This thesis presents a new shape representation called Unraveled Skeleton. It is designed to be robust to a large number of transformations, especially articulated movement. Based on this new representation, its general properties are studied and its application domain established. The representation is based on the skeleton or medial axis of a shape. It uses the very robust Voronoi skeletonization and "walks" around the resulting skeleton tree. At every point the minimum distance to the boundary is saved to a vector, resulting in a list of distance measures, a shape signature. This Unraveled Skeleton vector can then be used for further processing like normalization. This thesis explores two possible directions to use the Unraveled Skeletons: Shape analysis and pose independent coordinate systems. In shape analysis the Unraveled Skeleton vector can be used for articulation independent recognition. For this purpose a number of different normalization and optimization techniques are introduced. It is also possible to use parts of the vector to match parts of objects. The pose independent coordinate system assigns every point inside the shape and on its boundary a unique coordinate independent of pose or articulated movement. Using the closest points on the skeleton and its distance as provided by the Unraveled Skeleton one can address convex patches. Given a third component to chose a point from this patch a coordinate with three components is introduced. This paper states three applications to use the Unraveled Skeleton coordinates: Shape Blending, Super-resolution and Segmentation refinement. It also explores possibilities of non-centered skeletons in regard to Unraveled Skeleton coordinates. All results are evaluated on multiple datasets. xi Contents Kurzfassung ix Abstract xi Contents xiii 1 Introduction 1 2 Unraveled Skeletons 3 2.1 Medial Axis Transform and Skeletons . . . . . . . . . . . . . . . . . . 4 2.2 Voronoi Skeletons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Distance Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Unraveled Skeletons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Shape Analysis 19 3.1 Shape Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Shape Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Graph and Vector Matching . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Normalization and Optimization . . . . . . . . . . . . . . . . . . . . . 22 3.5 Part Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Pose Independent Coordinates 29 4.1 Shape Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Unraveled Skeleton (US) Coordinates . . . . . . . . . . . . . . . . . . . . 31 4.3 Non-centered Skeletons . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5 Evaluation 39 5.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Shape Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3 Pose-independent Coordinates . . . . . . . . . . . . . . . . . . . . . . . 47 xiii 6 Conclusion 55 Acronyms 57 Bibliography 59 CHAPTER 1 Introduction This thesis introduces a new shape representation based on the skeleton and investigates its possible use for shape analysis and as pose-independent coordinate system. A skeleton, first introduced by Blum [3], is a reduction of a shape’s dimensionality that retains information about the structure of the shape. The skeleton has been popular since its introduction for various tasks in computer vision and computer graphics: Object descrip- tion, retrieval, manipulation, matching, registration, tracking, recognition, compression and part segmentation [27] to mention just a few. The advantage of the skeleton representation of an object is the reduction in dimension and the stability under different transformations like scaling and rotation. Another reason for the widespread use of skeletons is that they do not change the topology of the shape and additionally preserve the structure and spatial relationship of object components. The structural preservation capabilities of the skeleton make it an excellent choice to represent shape sequences, where the shape performs an articulated movement. The articulated movement is a deformation of the shape that retains multiple rigid parts connected by joints, for example a walking animal. Related to the articulated movement is the pose problem, i.e. the change of a shape extracted from different poses. In recognition, a good descriptor can relate same objects in different poses. Representation and recognition of articulated shapes has been studied in research [22, 14], but it remains a challenging problem. The main challenge is to find a representation that can handle the pose problem and still distinguishes between different objects. The focus of the research lies in the recognition of articulated shapes and not much work has been done for the registration, i.e. uniquely addressing points on the shape across poses. The aim of this work is to study a new shape representation (US) with its properties and applications. To enable comparison with state-of-the-art approaches, the shape representation is analyzed for recognition purposes in greater detail. Apart from the representation for recognition, various comparison methods are discussed. Additionally to 1 1. Introduction recognition, the registration that is sparsely addressed in the literature (pose independent coordinates) is the second focus of this work. Pose independent coordinates unlock new ways to handle well-known problems in computer vision, like segmentation and super-resolution. In combination with registration the notion of non-centered skeletons is studied to be able to handle articulated movement even better. The following are the contributions in this thesis: • Proposal of a new shape representation (US) based on the skeleton and distances to the boundary. • Description of the properties and applications of proposed shape representation. • Normalization of proposed shape representation for different use cases. • Matching and recognition techniques for proposed shape representation including matching parts of a shape. • Proposal of a new pose independent coordinate system based on the US. • Evaluation of proposed shape representation in recognition and as pose independent coordinate system. The remainder of the thesis is structured as follows: In Chap. 2 first an introduction to shape representations with a focus on various invariances is given and skeletonization is introduced. Then the basic idea and properties of the US are presented. Chapter 3 looks at the US from the point of view of a shape analysis or shape recognition. First a look at related work is given, then matching of US representations is described. The chapter takes a closer look at normalization and optimization for a good matching and introduces part matching with the US. Pose independent coordinates are introduced in Chap. 4. The first sections give an overview of state-of-the-art techniques, then pose independent coordinates using the US are described. This chapter also takes a look at non-centered skeletons. The chapter finishes with a description of applications for US pose independent coordinates. Chapter 5 evaluates different aspects of the US, first in a general sense, then regarding shape analysis and finally regarding pose-independent coordinates. The thesis is concluded in Chap. 6. 2 CHAPTER 2 Unraveled Skeletons Many computer vision tasks - such as classification or segmentation - require structural and geometric information of shapes. It is therefore an ongoing topic for research to extract features from images and shapes and represent them in a way that is detailed enough to distinguish different shapes, but general enough to group similar shapes together. For classification and clustering purposes the difference (or distance) of features should be minimized in the same class and maximized between different classes. For in-class minimization invariances play an important role. In natural images and shapes various factors exist, which should not influence the calculated feature. For example, in most cases it should be irrelevant at which resolution an image is taken, so a feature should not rely on the absolute size of an object. In the ideal way, a feature derived with a procedure R that is invariant to a change N yields the same feature output: R(S) = R(N(S)), with S being the Scene [17]. According to Yang et al. the following invariances are desirable for a shape descriptor [35]: Translation, Scale and Rotation These transformations do not change the shape at all. The resulting feature output should therefore be the same regardless of the position, size or rotation of the shape. Affine transformation A transformation is considered affine if parallel lines are still parallel after the transformation. It is a combination of translation, scaling, rotation, flips (negative scaling) and shears. Noise Noise on the boundary and inside the object should not change the feature output. This is important to avoid errors based on bad segmentation. Occlusion Partly occluded shapes should give similar feature outputs for visible parts. 3 2. Unraveled Skeletons Additional to those invariances, pose invariance is desirable for shape descriptors that deal with shapes that may employ articulated movement, like humans or animals. Definition 1. A shape D is constructed of multiple rigid parts Rj where points inside a rigid part do not move relative to each other from timestep ti to timestep ti+1 :∀p1, p2 ∈ Rj : d(p1(ti), p2(ti)) = d(p1(ti+1), p2(ti+1)), where d(p1, p2) is the Euclidean distance between points p1 and p2. An articulated movement happens if two or more rigid parts share a common point c and local angles between parts change. Rigid parts sharing a common point "rotate" around this point. One can think of articulated movement as two parts connected by a joint. For pose invariance it is not important what pose a shape is in, e.g. a horse running or jumping. In most real situations an articulated movement will introduce small deformations around the joint: compression on the inside and expansion on the outside. In this chapter a new shape representation and descriptor is introduced that is invariant to translation, scale, rotation, noise and articulated movement. 2.1 Medial Axis Transform and Skeletons The representation presented in this chapter relies on skeletons. In this section we review the Medial Axis Transform (MAT) and skeletons. The Medial Axis Transform was first introduced by Blum [3]. He describes the MAT as the locus of all points where grassfire fronts meet. A grassfire considers the shape as a field of grass that is simultaneously lit from the boundary and the fire travels at constant speed. Skeleton and MAT is mostly used interchangeably in literature [30]. The definition given here is based on the one given by Bai et al. [1]. Definition 2. Given a shape D with boundary ∂D, the set Tan(s) = {b ∈ ∂D|d(b, s) ≤ d(b′, s),∀b′ ∈ ∂D} contains for point s ∈ D the closest points on the boundary. The degree deg(s) = |Tan(s)| gives the number of closest points1 for a point s. The skeleton S(D) = {s ∈ D|deg(s) ≥ 2} is a set containing all points with degree greater than one. Points in Tan(s), s ∈ S are called the generating or tangent points of a skeleton point s. These skeleton points also correlate with centers of maximal inscribable disks in the shape. A maximal inscribable disk is a disk confined inside the shape, that is not completely inside any other inscribable disk. The skeleton can be described - and was in fact by Blum - by the locus of the centers of all maximal inscribable disks. This furthermore gives each skeleton point a distance to the boundary based on the radius of the inscribed disk (see Sect. 2.3). 1|X| denotes the cardinality of set X. 4 2.1. Medial Axis Transform and Skeletons Definition 3. A point s ∈ S can fall in exactly one of the following categories: s is an endpoint if it has only one skeleton point neighbor, s is a branch or joint point if deg(s) > 2 and s is not an endpoint and s is a common point if deg(s) = 2. The union of common points connecting two branch points, two endpoints or a branch and an endpoint is called a skeleton edge or branch. The definitions given above describe the skeleton in a continuous space, but in most cases computer vision is concerned with discrete representations (e.g. images). When confined to the discrete grid, there exist different solutions for a skeleton. Therefore the term skeleton sometimes gives more liberty regarding the centeredness to improve other desired properties such as simplicity. 2.1.1 Skeletonization Approaches Literature provides hundreds of skeletonization algorithms that solve the problem for different domains and applications. This section only gives a short overview, the interested reader is referred to the survey by Saha et al. [27] for more information. A skeleton produced by a skeletonization algorithm should have the following properties [15]: 1. The skeleton must be thin, i.e. in images it must be one pixel wide. 2. The skeleton must approximate the medial axis. 3. Thin curves and endpoints must be preserved. 4. Connectivity of foreground and background must be preserved, i.e. the algorithm does not break the topology of the shape. Skeletonization algorithms can be grouped into three categories [27]: Continuous geometry approaches, Continuous boundary evolution and discrete approaches using morphology or distance transform. An approach that is widely used by different algorithms is to simulate Blums grassfire on the discrete grid with different rules when to delete pixels. As an example that uses that technique, the parallel thinning algorithm by Zhang and Suen [39] is given as it is still used an modern computer vision libraries (e.g. scikit2). It considers the foreground to be in an 8-connected neighborhood and the background in a 4-connected neighborhood (See Fig. 2.1). Since each pixel is considered in parallel the algorithm uses two subiterations to avoid removing a two pixel wide line at once (and thus breaking topology). In a first subiteration contour points are deleted which fulfill conditions in the 2https://scikit-image.org/docs/dev/api/skimage.morphology.html#skeletonize (Accessed: 13/03/19) 5 2. Unraveled Skeletons local 3x3 neighborhood. In a second subiteration the conditions are changed slightly. The algorithm erases pixels first from the lower right, then from the upper left. The algorithm trades centeredness for simplicity and robustness to boundary noise. Figure 2.2a shows example skeletons produced by the algorithm and illustrates a problem: The algorithm is not scale and rotation invariant. Unfortunately, this is a problem not easy to solve due to the grid based nature of the approach. (a) 4-connected neighborhood (b) 8-connected neighborhood (c) If the foreground is 8-connected, the back- ground must be 4-connected. Figure 2.1: 4- and 8-connected neighborhoods. A different problem is illustrated in Fig. 2.2b: The algorithm is not invariant to noise on the boundary. This problem can however be addressed by pruning. Pruning is a technique where branches that are generated by noise - so called spurious branches - are removed by using a global shape measure [27]. Pruning however is a challenging task and can lead to topology breakage if done incorrectly. A promising pruning approach was proposed by Bai et al. [1]. It first partitions the contour into multiple parts and then removes skeleton points if its generating points belong to the same part. The quality of the pruned skeleton is highly dependent on the chosen parts. Bai et al. show that Discrete Curve Evolution (DCD) yields a promising partition and that any partition preserves topology. Unfortunately, their approach needs a fixed number of parts for DCD. 2.2 Voronoi Skeletons A different skeletonization algorithm uses a Voronoi Diagram of the contour points [25]. It is very robust to noise and invariant to Translation, Scale and Rotation. Since these are the properties that are needed for the proposed representation, this is the skeletonization approach of choice and it will be explained in more detail. 2.2.1 Voronoi Diagram A definition based on the one given by Costa and Cesar [7]: Definition 4. A Voronoi Diagram (VD) or Voronoi Tessellation is a partition of R2 into N regions Ri, i ∈ {1, . . . , N}, called cells, deduced from N points pi, called sites. Each point p ∈ Ri is closer to pi then to any other point pj , j 6= i, Ri = {p ∈ R2 : d(pi, p) ≤ d(pj , p), j ∈ {1, . . . , N}}. Points that are closest to two sites define the Voronoi edges or 6 2.2. Voronoi Skeletons (a) Influence of Translation, rotation and scale. (b) Shape with noise Figure 2.2: Skeletons created by algorithm of Zhang and Suen [39]. Skeleton thickness increased for illustration purposes. Voronoi boundaries. A Voronoi edge has two generating sites. Voronoi vertices are points connecting Voronoi edges and have three or more closest sites. Voronoi edges contain only points that have the same distance to two or more points, which in turn is also the property of the skeleton. Figure 2.3a shows the VD of 10 random points. Figure 2.3b shows the VD of hand picked points on the boundary of a shape. Bold lines are all lines where generating sites do not lie next to each other on the contour. These bold lines already show that the skeleton is closely related to parts of the VD. By constructing the Line Voronoi Diagram (LVD), Lee [21] shows that all Voronoi edges that do not lead to concave boundary points are in fact part of the medial axis. 2.2.2 Residuals and Voronoi Pruning Figure 2.4 shows what happens to the approach above if all contour points are considered sites. Even though the contour is smooth, discretization introduced enough noise to create many spurious branches. The pruning method introduced by Ogniewicz and Ilg [25] takes their observation into account that the shortest distance on the boundary between the generating points of a Voronoi edge is a good indicate for its importance to the skeleton. This property was also used in the last approach, where Voronoi edges with neighboring generating points were omitted. Based on their observations they assigned each Voronoi boundary point a Residual value. Ogniewicz and Ilg distinguish four different Residual values that can be used to prune the skeleton (Fig. 2.5): 7 2. Unraveled Skeletons (a) VD for 10 random points. (b) VD for hand picked contour points. Figure 2.3: Example of Voronoi Diagrams. Figure 2.4: VD of camel head with all contour points as seed points. Potential Residual The potential residual ∆RP of a Voronoi edge e is the shortest distance on the boundary between the generating points pA, pB of e. This can be efficiently calculated by pre-computing a boundary potential function W (p). This function assigns for every point p the distance along the boundary to a common, arbitrary origin point on the boundary. ∆RP can than be computed: ∆RP (e) = min{|W (pA)−W (pB)|, LB − |W (pA)−W (pB)|} (2.1) where LB is the total length of the boundary. If pA and pB are on different boundaries (e.g. on a hole and on the outer boundary), ∆R(e) is infinity. Figure 2.5a shows the potential residual. Circular Residual To show how good a skeleton point m on an edge e approximates the boundary the circular residual ∆RC subtracts the circular arc b with center m between the generating points from the potential residual: ∆RC(e,m) = RP (e)− b. See Fig. 2.5b for an illustration. 8 2.2. Voronoi Skeletons Bicircular Residual The bicircular residual ∆RB considers boundary parts as semi- circles. It is aimed to overcome shortcomings of the circular residual (See Ogniewicz and Ilg [25] for more details). ∆RB is given as: ∆RB(e,m) = 2 π (π2 ∆RP (e)−b). The factor 2 π ensures that for edges in the center (b << RP (e)) the residual functions behave similar. Figure 2.5c illustrates the bicircular residual. Chord Residual The chord residual ∆RH replaces the circular arc by the chord length s = d(pA, pB): ∆RH(e) = RP (e)− s. See Fig. 2.5d for an illustration. pA pB m e (a) Potential residual pA pB m b r e (b) Circular residual pA pB m b r e (c) Bicircular residual pA pB m s e (d) Chord residual Figure 2.5: Illustration of the four residual functions (in red). After residual computation for every point on the Voronoi boundaries the pruning is a simple matter of thresholding on the residual value. Ogniewicz and Ilg give a formula to compute threshold values for different residuals based on the size of noise on the boundary. If a is the length of a raster crack (1 or √ 2 for pixel grids) and n is the number of raster cracks that noise can protrude, an optimal threshold for the circular residual is na(2− √ 2). They also show that all their residual prunings preserve topology. See their work for proofs on thresholds and topology preservation [25]. 9 2. Unraveled Skeletons Figure 2.6 shows Voronoi Skeletons. One can see that in difference to Fig. 2.2 the skeleton is invariant to translation, rotation and scale. Additionally the noise can be handled more elegantly by increasing the threshold. (a) Influence of Translation, rotation and scale. (b) Shape with noise, threshold is 10 Figure 2.6: Skeletons created by Voronoi Skeletonization using circular residual. Skeleton thickness increased for illustration purposes. Figure 2.7 shows the same shape skeletonized with different residual functions and the suggested thresholds by Ogniewicz and Ilg. It can be seen, that the circular residual gives the cleanest skeleton, but misses parts of the wings. For the rest of the work if not otherwise mentioned the circular residual is used. (a) Circular Residual (b) Bicircular Residual (c) Chord Residual Figure 2.7: Skeletons created by Voronoi Skeletonization using different residual functions. Skeleton thickness increased for illustration purposes. 2.3 Distance Transform It is already clear from Blum’s definition of the MAT that there is a relation between the distance of points in a shape to the boundary and the skeleton. The following definition 10 2.4. Unraveled Skeletons is based on the one given by Klette [17]. Definition 5. Given a shape D the Distance Transform (DT) assigns every object point the closest distance to a non-object point: DT (p) = min q∈R2\D {d(p, q)} (2.2) A disk centered at an object point p with radius DT (p) is totally contained in the object, therefore the point with the highest DT in a shape lies on the skeleton. Figure 2.8 shows the DT on an elephant shape. The right plot shows a carpet plot of the DT. One can see that the ridges of the plot correlate with the skeleton. Figure 2.8: Distance transform of an elephant shape. 2.3.1 Shape Reconstruction from Skeleton Given an unpruned skeleton S and the DT at skeleton points, the whole shape can be perfectly reconstructed by the union of circles with center s ∈ S and radius DT (s). For pruned skeletons the reconstruction looses high frequencies on the boundary. This can be used to get a more general shape of an object. 2.4 Unraveled Skeletons The skeleton preserves topology of a shape and the DT provides geometrical information. Using a combination of these two properties by building a connectivity graph was proposed by Zaboli et al.[36]. They use graph matching to cluster shapes. The Unraveled Skeleton (US) is a feature vector that represents the shapes thickness at various points of the skeleton. Together with the skeleton it can reconstruct the shape. The US-Vector is constructed by walking along the skeleton and writing its DT into the vector. At every branching point one turns right following a new branch and at every endpoint one turns back along the branch one came from. In the end one will come back to the starting point. This is only possible for a skeleton tree (the object has no holes). 11 2. Unraveled Skeletons The DT at a skeleton point s ∈ S is also the radius function R(s) at s. R(s) is defined as the radius of the maximal inscribed disk at s. Figure 2.9 shows the unraveling process. Starting from node A the unraveling process continuous to node B, then C and so forth till it reaches node A again. The DT values are stored at the specific position. If plotting against a circle one can see that it is possible to map the shape boundary to polar coordinates. We call Figure 2.9b the Unraveled Skeleton Circle representation. The red lines and arcs in Figures 2.9b and 2.9c connect the same branching points. US(D,S) gives the US vector on the shape D and skeleton S. (a) Example shape and unraveling process (b) Circle representation of US (c) Unraveled Skeleton Vector Figure 2.9: Illustration of unraveling of the skeleton and creation of feature vector. 2.4.1 Reconstruction A reconstruction is possible if the skeleton and the US vector is given. The skeleton delivers topological and the US geometrical properties of the shape. Additionally, the skeleton provides the embedding in the plane. 2.5 Normalization For many approaches using the US a normalized vector is desirable. Since the US is cyclic the starting point s0 plays a miner role, but linear representation needs a robust 12 2.5. Normalization point s0. Here an overview is given. See Sect. 3.4 for more details on how to enhance the normalization and optimization to define distances between US vectors. 2.5.1 Eccentricity Transform The Eccentricity Transform (ET) comes from the field of path-finding in graphs and was adopted for binary images by Kropatsch et al. [19]: Definition 6. The eccentricity transform gives for every point p ∈ D in a binary shape the longest minimum-distance to any other point q ∈ D: ET (p) = max q∈D {λ(πα(p, q))} (2.3) where πα(p, q) gives the shortest α-connected path in the shape between p and q and λ(πα(p, q)) is the length of the path πα(p, q). The ET is lowest in the middle of the shape and robust to deformation, as well as translation and rotation. 2.5.2 Starting Point and Orientation A good starting point s0 is the branching point with the lowest ET. It is the robustest point of the skeleton and does only change if there are multiple branching points with similar ET. For the shape D in (2.3) only all endpoints have to be considered with π8(p, q) being the distance along the skeleton S. ET can than be checked for branching points only. After choosing s0, a direction to start moving from the branching point must be found. Since the ET of all branching points is already known, the next node to visit is the adjacent one with the next smaller eccentricity. If there is only one branching point, the endpoint with biggest eccentricity is chosen. Starting point and orientation with graph based centrality measures is also possible [6]. 2.5.3 Length Normalization There exist two different ways one can normalize the length of the US vector. In the first way the whole vector is simply scaled to a fixed length vector with interpolation of values. For perfect skeletons and highly articulated movement this normalization is perfect, because the skeletons inner proportion should be the same (e.g. a scaled shape). See Sect. 3.4 for advanced normalization techniques. 2.5.4 Amplitude Normalization Amplitude normalization is used for scale independence. The feature vector is normalized between 0 and 1, where the maximal radius is mapped to 1 and 0 to 0 linearly. This 13 2. Unraveled Skeletons ensures proper scaling of different objects. If using amplitude normalization the maximum value is needed for reconstruction. 2.6 Holes The shape to be represented by US must not have holes, as this would introduce cycles in the skeleton graph. Depending on the application, holes play a differently important role and the need might arise to represent holes too. For completeness some solutions to the problem are suggested here. The simplest solution to allow holes to be represented is to process them individually resulting in an US vector for the shape and every hole. The disadvantage of this approach is that the number of feature vectors for a single shape varies. A different approach is to break the cycles in the graph in a deterministic manner. Because the ET has given robust results for determination of the starting point the following approach for breaking cycles is suggested: Calculating the ET for every branching point contained in a cycle. The branching point with the smallest eccentricity is then split into two points. This will introduce one new endpoint. The other point might downgrade to a common point. If not all cycles have been resolved proceed to the branching point with the next smallest ET in a cycle. The splitting of a branching point can be done in multiple ways. The following rules guarantee a deterministic split: • The branch becoming an endpoint (endpoint branch) has to be included in a cycle. • The endpoint branch is the one where the split introduces the smallest increase in the skeletons diameter, i.e. the maximum eccentricity of any point in the skeleton. This can be achieved by taking the endpoint branch where the endpoint has the smallest eccentricity in the splitted graph. Following these splitting rules the skeleton diameter is always the smallest possible. 2.7 Properties Here some properties of the US representation are given. 2.7.1 Invariances of the US Representation It is obvious that the US is highly dependent on the skeletonization as spurious branches would change the feature vector drastically. Therefore, for the representation to be invariant to translation, rotation, scale, noise and articulation, the skeleton must be invariant to those transformation too. The following presumes that the skeleton is 14 2.7. Properties invariant to translation, rotation, scale, flip and boundary noise. This is true for the Voronoi skeleton with an appropriate threshold. Translation invariance The US is translation invariant, as the radius function is invariant to translation too. Rotation invariance The US vector that is normalized in regard to starting point and orientation is rotation invariant. If considered as a cycle any US is rotation invariant. The DT using Euclidean distance is rotation invariant. Scale invariance The length- and amplitude-normalized US vector is uniform scale invariant, non-uniform scale can lead to skeleton change and normalization change. Flip invariance The US is not flip/mirror invariant, as the resulting US vector is also mirrored. The mirror shape is like walking left instead of right at branching points. In the case this invariance is important for a use case, two vectors have to be created. Noise invariance The US is not invariant to noise, due to the different radius functions. The error introduced by noise on the boundary is of the same magnitude as the noise. Figure 2.10 illustrates a noisy boundary and the reconstruction using the given skeleton (black) and US (green). The US orientates itself on the noise that is closer to the skeleton, as the circles cannot penetrate to outlying boundary points. The closer the skeleton to the boundary, the better the noise is approximated. Invariance to articulated movement If the skeleton is invariant to articulated move- ment, the US is too, as the radius functions of the skeleton and the skeleton do not change. Figure 2.10: Affection of boundary noise on the US. The skeleton is black, the correct boundary green, the noise boundary red and the reconstruction by US blue. The plot on top shows the error. 15 2. Unraveled Skeletons 2.7.2 Branching, Endpoints and Common points Branching points PB appear at least three times in the US vector, common points PC twice and endpoints PE only once. The final vector will contain about twice as many values as the skeleton has points. The exact size of the final vector is dependent on the number of branching points and endpoints: |US(D,S)| = ∑ p∈S(D) deg(p), where deg(p) is the degree of point p, as defined in Def. 3. Theorem 2.7.1. In an Unraveled Skeleton Circle representation lines connecting multiple occurrences of the same branching or endpoint will never cross each other. Furthermore is every circular arc of the US Circle between two occurrences of the same branching or endpoint a connected subtree of the original shape skeleton. If this subtree is removed from the original skeleton, the remaining skeleton is still a connected tree. Proof. The same occurrence of a skeleton point p on the US Circle (p1, p2) indicates that a removal of all points on one side of p will leave all points on the other, as the skeleton S is a tree structure. When we split a tree into two parts, we get two trees. The first statement follows directly from the other two. A crossing of lines means that the two generated trees would overlap. That cannot happen unless p1, p2 do not belong to the same skeleton point. 2.7.3 Uniqueness The US vector is unique for every shape. Only articulated movement or rotation of a shape can produce the same US vector. This is because the symmetry in the vector indicates the length of the skeleton branches. The only exception are sections of the US vector that have the same value, as then the symmetry property cannot be used to reconstruct the skeleton branch length. There exist only one exception where two different shapes produce the same US vector. This is a constant radius function on the skeleton over multiple, successive endpoints. 2.7.4 Symmetry The US vector skeleton reflects mirror and rotational symmetry of a shape. Two parts of the shape are mirror symmetrical if two connected parts of the cyclic US vector are mirror symmetrical. The shape is rotation symmetrical if the same pattern repeats multiple times in the cyclic US vector. Note that symmetry in this case does allow for articulated movement and the detected symmetry might only be obvious in one pose of the shape. This allows for detection of symmetry in articulated shapes (e.g. a human). 2.7.5 Structure Encoding The local symmetry in the US vector reflects the structure of the skeleton. as soon as this symmetry around a point is broken, the symmetric section corresponds to a branch 16 2.7. Properties and its end to branching points. By not considering this section in the new symmetry consideration internal branches can be detected. The correct reconstruction of the structure by the US vector is not possible in all cases. The best example is a shape that has a skeleton with same radius function everywhere. The resulting US vector has then the same value everywhere and is not distinctive. 2.7.6 Examples In Figure 2.11 two example US vectors from shapes from the MPEG7 dataset are given. Note the symmetry of the flower paddles and the similarity of the horse legs (green circles) compared to the tail (magenta circle). The figures show the skeleton and reconstruction (white) as well as the reconstruction error in green. (a) Example 1 with skeleton and reconstruc- tion error (green) (b) Example 2 with skeleton and reconstruc- tion error (green) (c) Non-normalized US vector of Example 1 (d) Non-normalized US vector of example 2 Figure 2.11: Two example shapes with their unraveled skeleton (bottom), skeletons, reconstruction (white) and reconstruction error (green). The original shape is the combination of green and white areas. 17 CHAPTER 3 Shape Analysis Shape is an important feature for human’s visual perception to recognize objects [33]. Costa and Cesar [7] organize shape analysis into three steps: shape pre-processing, shape transformation and shape classification. This chapter is most concerned about the second and third step and assumes a correct pre-processing (segmentation). Based on Costa and Cesar’s terminology the Unraveled Skeleton (US) is a shape descriptor. Additionally, this chapter deals with shape classification and its subcategories that need some kind of distance or similarity measure between shapes. This chapter reviews State-of-the-art shape analysis and matching based on contours and considers different distance calculation approaches for the US vector. 3.1 Shape Recognition Shape recognition is an area of pattern recognition that concerns itself with the clas- sification of shapes. Supervised shape classification, hence called classification, and unsupervised shape classification, hence called clustering, are similar shape recognition techniques that rely on the (dis-)similarity of features between (dis-)similar shapes. In both tasks labels are assigned to shapes. The difference between classification and clus- tering is how labels are assigned. The following definitions are adapted from Theodoridis and Koutroumbas [32]: Definition 7. In classification or supervised learning a classifier is constructed by known shape-label matchings of example shapes (ground-truth). The classifier divides the space of all shapes into multiple regions, each region is associated with one label. In most cases the classifier is first trained on these example shapes (training set) and then tested against unknown shapes. This process is called learning or training a classifier. 19 3. Shape Analysis Definition 8. Clustering or unsupervised learning assigns a shape to a class or cluster without known ground-truth. It therefore learns a grouping of similar shapes and assigns a common label. The goal of shape retrieval is to find similar shapes to a provided one. This is closely related to clustering. For both clustering and shape retrieval a similarity measure is needed. As such a measure a metric is desirable [18]: Definition 9. A metric is a function d(x, y) with x, y ∈ X, X is some set, if the following is satisfied: 1. d(x, y) is positive definite: d(x, y) ≥ 0, ∀x, y ∈ X and d(x, y) = 0⇔ x = y. 2. d(x, y) is symmetric: d(x, y) = d(y, x), ∀x, y ∈ X 3. d(x, y) fulfills the triangle inequality: d(x, y) ≤ d(x, z) + d(z, y), ∀x, y, z ∈ X 3.2 Shape Description Shape description is the extraction of shape properties. Simple properties of shapes are for example its area, perimeter or diameter. Zhang and Lu [38] categorize shape descriptors into two groups: region-based and contour-based features. Each of these groups is then divided into structural and global features. For example the Medial Axis Transform (MAT) is a structural region-based feature. 3.2.1 Contour-based features Structural contour-based features break the boundary into smaller parts with properties depending on the method. An example is the approximation of the contour by a polygon or a chaincode. As a chaincode is also a good possibility to store the skeleton, a short explanation is given: Chaincodes were first proposed by Freeman [12] and encode a curve by walking along it and storing the required steps. In an 8-connected neighborhood, there are eight possible directions to go at any time. The Freeman chaincode stores the directions in a list. Giving the starting point position the contour can be reconstructed. Global contour-based features are constructed from the boundary as a whole. Simple descriptors are for example the perimeter or the ratio between perimeter and area. An important global contour-based descriptor for this work is the shape signature, as the US falls into this category. A shape signature represents the shape by a single feature vector based on the boundary. An example is the centroid distance, that maps the boundary points on a feature vector based on the points distance to the shapes centroid, which is similar to the US. 20 3.2. Shape Description Polar Representation Bernier and Landry [2] introduced a shape descriptor based on the distance of contour points to the centroid. Instead of taking values along the contour, they send out rays from the center and map the boundary to the range 0 to 2π. This process produces a function that might have multiple values for non-convex shapes. Euclidean and Geodesic Distance Profiles Janusch et al. [16] proposed a method similar to the US vector. They select a point on the medial axis as reference point (e.g. the centroid) and assign every boundary point the distance to this point. The difference to the centroid distance is that their distance is the sum of the Euclidean distance to the nearest point on the skeleton and the geodesic distance on the skeleton from this nearest point to the reference point. The geodesic distance measures the shortest path inside a shape. In contrast to the US vector their vector is computed along the boundary instead of along the skeleton. An articulation will therefore shift the part of the vector that is behind the articulation, resulting in non-trivial comparison. Comparing only parts of objects is not effected by this behavior and the representation is therefore considered invariant to articulation. 3.2.2 Region-based features Region-based descriptors take into account all points on the boundary as well as all points inside the shape. Structural region-based methods split the shape into parts. The skeleton and MAT is such a structural region-based feature. One can see the skeleton as a structural reduction of the region to a single curve representation with branches being parts. Global region-based features (e.g. geometric moments) will not be discussed here. 3.2.3 Shock Graphs The skeleton represents the topology of the shape and can be converted into a graph (Skeleton Graph (SG)) where branching and endpoints are nodes and branches are edges. A different approach is used by shock graphs, where skeleton branches are converted to graph nodes that are connected if the branches share a common branching point [31]. Each node is associated with one of 4 shocks that relate to the radius function on the skeleton. A 1-shock is a monotonic in-/decrease, a 2-shock a local minimum, a 3-shock is constant and a 4-shock a local maximum. A different shock graph was introduced by Sebastian et al. [28] and defines 1- and 3-shocks as links and 2- and 4-shocks as nodes. Additionally, they consider branching points as nodes between 1-shocks and keep the graph embedded in the plane. 21 3. Shape Analysis 3.3 Graph and Vector Matching Compared to shock graphs, the graph associated with the US is the SG. The goal is to create a metric on two US-vectors that is small for similar shapes and big for dissimilar shapes. Many related works using skeletons employ graph matching, i.e. they calculate the difference between graphs. Most of them rely on sub-graph isomorphism [31, 8] or on the more advanced Graph Edit Distance (GED) [28] that gives the minimum required edit operations to transform one graph to another. For unrestricted graphs both are NP-hard [26]. There exist polynomial algorithms for sub-graph isomorphism on trees, that are used by shock graphs. Zaboli et al.[36] use an edit distance based approach on a tree traversal similar to the US. The advantage of graph based matching is that structural properties are considered, at the cost of more expensive computation. Additionally, many pattern recognition methods were developed with feature vectors in mind and might not be readily adaptable for graph distances. Other shape recognition approaches not based on skeletons extract feature vectors from the shape (boundary) [2, 9]. The feature vector enables the use of various statistical pattern recognition methods. The US vector tries to combine the best of both methods by encoding the structural properties into the feature vector (see Sect. 2.7). 3.4 Normalization and Optimization In practice, it is common that similar shapes’ skeletons are slightly different and this can drastically alter the US vector. As the US vector is cyclic there is no real starting point, so comparison is complicated. There are three different ways to do the comparison (distance calculation): Normalization of the vector, Optimization of the comparison or a combination of both [33]. Normalization was already introduced in Sect. 2.5 and this section goes into more details important for shape analysis. 3.4.1 Normalization Normalization is most likely the faster approach, where the US vector is modified to a unified representation. As normalization considers the underlying graph (Fig. 3.2b), we call branching and endpoints special points of the skeleton. The sequence of visited special points based on the unraveling process is called the visiting sequence (Fig. 3.2c). Starting Point and Orientation A first approach is to find a unique starting point and orientation, which can be done using the eccentricity of the skeleton, as small eccentricity values indicate stable points in the center. Additionally the starting point s0 should fall on a branching point, as 22 3.4. Normalization and Optimization branching points of small eccentricity are likely to be found across similar, scaled shapes. The orientation is towards the branching point with the second smallest eccentricity the - based on the same thinking - next most stable skeleton point. The algorithm is given in Alg. 3.1. Algorithm 3.1: Starting Point and orientation normalization algotihm. Input: us US vector, vs visiting sequence of special points, ecc list of special points’ eccentricities Result: Rotated US vector to starting point and orientation 1 s0 ← getPointWithSmallestEccentricity(vs, ecc); 2 S1 ← successors(s0); 3 o0 ← getPointWithSmallestEccentricity(S1, ecc(S1)); 4 return rotateVector(us, s0, o0); There exist two cases where this method will not yield good results. The first is for shapes where two special points have the same eccentricity, the most simple shape only has two endpoints (Fig. 3.1a). The second case has a valid starting point, but two possible directions. This is possible if two special points have the second smallest eccentricity and are both directly connected to the starting point. A simple case is a symmetric shape with the symmetry around a single branch point (Fig. 3.1b). A B A B A A BB (a) Same eccentricity on both endpoints. B A C DE A AB C A D A E A A AC D A E A B A (b) Orientation not clear, since there are two endpoints B and C with same eccentricity to go to A. Figure 3.1: Shapes with not well defined starting points. A possible solution is to also take geometric information into account, and choose the point with bigger radius function if same eccentricity points exist. Unfortunately, this is not a stable approach, as the radius function is mostly very similar for points with minimal eccentricity (i.e., in the center of the shape). Another solution considers not only one special point but more points of the visiting sequence for the possible starting points and orientations. The orientation is then chosen by the eccentricity of the next not already visited point. To stabilize the starting point and orientation picking algorithm, a threshold can be used to decide when the eccentricity is considered equal. This improved algorithm is given in Alg. 3.2. It exploits the cyclic nature of the list of eccentricity values on special points and compares all possible starting point and orientations from the start till a minimum can be determined. 23 3. Shape Analysis Algorithm 3.2: Improved Starting Point and orientation normalization algo- tihm. Input: us US vector, vs visiting sequence of special points, ecc list of special points’ eccentricities, thres threshold for equality Result: Rotated US vector to starting point and orientation 1 S0 ← getPointsWithSmallestEccentricity(vs, ecc, thres); 2 SOcheck ← {S0 × {successors(s0) : s∈S0}}; 3 eccSOcheck ← ∅; 4 foreach so0 ∈ SOcheck do 5 eccSOcheck ← eccSOcheck ∪ rotateVector(ecc, s0(so0), o0(so0)); 6 end 7 for i← 1 to size(vs)− 1 do 8 ECCmin ← getArgumentOfSmallestValues({eccso0 [i] : eccso0 ∈ eccSOcheck }, thres); 9 if |ECCmin| = 1 then 10 SO0 = SOcheck[ECCmin[0]]; 11 s0, o0 = s0(SO0), o0(SO0); 12 break 13 end 14 eccSOcheck ← eccSOcheck [ECCmin]; 15 end 16 if notset(s0, o0) then 17 SO0 = SOcheck[ECCmin[0]]; 18 s0, o0 = s0(SO0), o0(SO0); 19 end 20 rotateVector(us, s0, o0); Simple Length Normalization The simplest form of length normalization, to obtain an unified length for a US vector, is to map the whole vector to the desired length with interpolation of values. In the case of true articulated movement and simple scale this works perfectly. The problem is that in most practical cases the skeleton branches change slightly in length resulting in change of length in the US vector too. Every element of the simple normalized US vector behind a branch that changed in length will be changed, resulting in poor comparison. Structure Aware Length Normalization To solve this problem one has to consider the structure of the skeleton and normalize branches individually. Structure Aware Length Normalization (SALN) is the normaliza- tion where each section between two special points in the visiting sequence (US sections) is normalized to a fixed number ns of samples. The final vector is then normalized with simple length normalization. 24 3.4. Normalization and Optimization Figure 3.2 shows a shape and its simple normalization and SALN as well as the interme- diate representation. The US tree (Fig. 3.2b) is constructed by starting at the starting point and adding a new node when coming across it on the visiting sequence (Fig. 3.2c). Its parent will be the predecessor of the added element in the visiting sequence. This process is similar to the one by Zaboli et al. [36]. Note that the tree reflects the structure of the skeleton/shape and its planar embedding is only depending on the starting point and orientation. (a) Example shape with skeleton. (b) Unique tree generated with US (c) The visiting sequence (lower) and its cyclic nature (upper) visualized (d) Resulting vector from simple normalization (middle), SALN (uper) and grid-based SALN (lower) Figure 3.2: Overview of simple normalization and SALN. SALN removes the problem of unaligned branches in the back of the US vector. On the other hand, it also has two major drawbacks: Firstly, branches that are very long/short will lose this distinctive feature, if they are forced to a unified length. Secondly, spurious branches can distort the vector very strongly. Short Branch Removal Another undesired behavior of skeletonization is the introduction of very short branches. With the US vector and the visiting sequence an additional pruning step can be easily achieved. All US sections shorter than a given threshold are removed. If a branch with an endpoint is removed, the branching point on the other side must be checked. It might degenerate to a common point. This can be easily detected in the visiting sequence, 25 3. Shape Analysis since the removed sections will border each other. The number of occurrences of this branching point in the visiting sequence is reduced by one (because two sequences were surrounding the endpoint and the fuse). If a branching point only has two neighbors (equal to two occurrences in the visiting sequence) it is a common point and must be removed from the special points and visiting sequence. This process merges the two US sections that are left and right of the former branching point (at both occurrences). After branch removal the starting position and orientation needs recalculation. It also changes SALN and should therefore be conducted before the SALN process. Grid-based SALN Another problem are lost uniqueness of elongated shape features. This can be addressed by introducing a binning feature on the lengths of branches. The general idea is the same as with regular SALN but instead of evenly spaced special points, special points are moved to closest points on a regular grid. This process can merge special points together and might transform branching to common points like short branch removal. This process depends highly on the chosen grid size. Figure 3.2d illustrates grid-based normalization. Note that at the highlighted positions branches have collapsed. 3.4.2 Optimization Optimization in regard to distances is the process of finding the optimal (minimal) distance between two (normalized) US vectors of length n. Instead of finding starting point and orientation based on a normalization approach, multiple comparisons are conducted to find the best match. A naive algorithm would do n comparisons and taking the smallest distance as the correct one. Special Point Optimization As the starting point should always be on a special point an optimization can take this property to reduce the number of needed comparisons. The US vector is rotated by the distance between special points on the US vector. This can be optimized even further by only taking branching points into account, as in any case with at least one branch point, the starting point is a branch point. Reference Point Optimization Special point optimization only works if indeed two US vectors share the same structure and have the same special points. There are two reasons for this to be the case. If the shapes are different, their special points will be too, and an optimization should not yield a close distance anyway. The more problematic case is if the skeletonization introduces an unwanted branch (e.g. due to projection). In this case the special points might not line up anymore. If the influence of the additional branch is weak, then reference point optimization [2] can give a better solution. 26 3.5. Part Matching In reference point optimization the local minima and maxima are considered as points for rotation. A parameter defines the number of extrema wanted and the vector is smoothed as many times as needed to obtain the desired number of extrema. 3.4.3 Distance between US Vectors Following the observation from the last two sections the simplest distance can be calculated on two normalized US vectors by taking the n-dimensional p-distance or Minkowski distance: dp(x, y) = ||x− y||p = ( n∑ i=1 |xi − yi|p )1/p , x, y ∈ Rn, p ≥ 1 (3.1) For p = 2 the Minkowski distance is the same as the Euclidean distance, p = 1 gives the sum of absolute differences between the points of the two vectors. If p → ∞ the Minkoski distance is the Maximum distance, i.e. it gives the maximum error between two components of x, y. For sufficiently normalized US vectors the Minkowski distance can be used, but if normalization fails, a distance based on optimization is preferred. This is basically the same distance but the minimum, maximum or average of multiple distance calculations with rotated y-US vector is used. 3.5 Part Matching As stated in the beginning of the last chapter, it is important for a shape descriptor to be invariant to occlusion. The US vector does not have this property, but together with the visiting sequence it is easy to compare parts of shapes with each other. There are two tasks in part matching. The first task is to find shapes that have similar parts, the second is to mark the parts of the shapes that are similar. 3.5.1 Part Matching on the US Vector As was already shown (Sect. 2.7) a connected part of the US vector that has the same skeleton point at both ends is a connected subtree of the original skeleton tree. It is therefore possible to compare parts of the US vector with another one and find the biggest overlapping. The process is similar to special point optimization, but instead of a minimum distance the length of connected US sections that are identical in both US vectors is measured. This identity can be relaxed by a threshold. The matching process is illustrated in Fig. 3.3 for a successful match. 27 3. Shape Analysis (a) First shape (b) Second shape (c) US vectors Figure 3.3: Two shapes and their part matching with US vectors. The highlighted boundary part is matched between both shapes. Along the matched boundary part and skeleton branches the US is the same (light green highlight). As B and C are very close, they are overlapping in the US vector illustration. The seconds shape’s US vector is offsetted for illustration purposes. 3.5.2 Marking Parts When a sufficiently large overlap is found, the corresponding contour parts of the shape have to be marked. This can be achieved by combining all generating points of Voronoi edges that contribute to the matched skeleton and if necessary filling gaps of points that do not contribute to the skeleton at all. The matched part corresponds to a tree in the skeleton and can therefore be split from the main skeleton. Another approach to mark the part is therefore to find the generating points Ps of the splitted point and an arbitrary generating point pe of an endpoint (leaf) of the matched subtree. The matched contour is then the contour between pe and its closest neighbors along the boundary in Ps. 28 CHAPTER 4 Pose Independent Coordinates The Unraveled Skeleton (US) has many invariances that are also interesting for a coordinate system that is independent of pose: Definition 10. A Pose Independent Coordinate System (PICS) assigns every point on a shape a unique coordinate (Shape Coordinates), usually a tuple of real values. The point retains this coordinate independent of location, rotation and articulated movement of the shape. Therefore, a PICS is a natural representation of the boundary and interior of a shape that allows articulated movement, translation and rotation without effecting the relation of internal points to each other. This can be useful for many applications (see Sect. 4.4) and related work has been done mainly in the field of texture mapping and shape blending in computer graphics. 4.1 Shape Coordinates In this section related work in the field of shape-based coordinates is reviewed. The research area is small and most work has been done in the area of computer graphics in texture mapping and object deformation. Texture mapping refers to the process of projecting an image to a shape, in most cases a rectangular image (texture) to an arbitrary 3D boundary model. In most cases coordinates of the rectangular image are known for key points on the object’s boundary and coordinates of the texture are interpolated in between those key points. In this case shape based coordinates improve the interpolation. Object deformation on the other hand improves position of boundary points on deformed parts, based on coordinates of the shape (or undeformed parts). 29 4. Pose Independent Coordinates 4.1.1 Polar Coordinates The most basic shape coordinates are polar coordinates, where every point is described by a distance and relative rotation to a starting point and orientation. In most cases the starting point is the centroid of the shape, the orientation along its greatest elongation. Polar coordinates are not articulation invariant as relative distance and orientation change on parts of the shape during movement. 4.1.2 Barycentric Coordinates Original designed for triangles to give every point in a triangle a coordinate dependent on the triangle vertices, barycentric coordinates are extensively used in computer graphics for various tasks where interpolation is needed. Generalizations to arbitrary polygons and connected triangles (triangle mesh) have been researched for a long time [11]. In this section two recent methods are described that interpolate based on deformation of the polygon. For more barycentric extensions see Floater’s survey paper [11]. Mean Value Coordinates Mean Value Coordinates [10] were introduced to represent a vertex v0 as a combination of its neighboring vertices vi, i ∈ {1 . . . k} in a triangle mesh. Each neighboring vertex is given a weight λi ≥ 0 such that the following two properties are fulfilled: k∑ i=1 λivi = v0 and k∑ i=1 λi = 1. If k = 3 this corresponds to the classical barycentric coordinates. Since there are infinitely many solutions for weights that satisfy these properties (e.g. just using three neighbors that include v0 and use classical barycentric coordinates), they have provided a formula to calculate the weights, such that all neighboring vertices contribute smoothly to the coordinates of v0. Green Coordinates and Complex Barycentric Coordinates Green Coordinates [23] and later Complex Barycentric Coordinates [34] are designed to preserve the polygons texture after deformation. Weber et al.[34] show that Green Coordinates are a special case of Complex Barycentric Coordinates. Instead of real numbers for boundary points and weights they use complex numbers with the Cartesian x-coordinate being the real and the y-coordinate being the imaginary part. Similar to the Mean Value Coordinates two properties have to be fulfilled: The constant precision property that states that the sum of all weights must equal one (i.e. the imaginary part is zero) and the linear precision property that states that the linear combination of boundary points with coordinates for an interior point p must produce the Cartesian coordinates of p in its real and imaginary parts. 30 4.2. US Coordinates To obtain the texture of a deformed shape, they first acquire the complex barycentric coordinates for the starting shape and then use a complex function to map those coordinates to the deformed shape. Both are not rooted on articulation joints and therefore do not fulfill the definition for PICSs. 4.1.3 Skeleton-based Coordinates There have been skeleton-based approaches to give coordinates on the boundary of 2D [29] and the surface of 3D [24] objects. These coordinates do not extend into the interior of the shape and have been considered for specific use cases. Star Skeletons An approach by Shapora and Rappoport uses Star-Skeletons [29] to blend two polygons into each other, given correspondences on the boundaries. Star-Skeletons are a partition of the shape into star polygons that are connected with a skeleton-like structure. A star polygon is a polygon for which there exists a point (star origin) that has line-of-sight to all points of the boundary’s polygon. Every two neighboring star polygons in a shape share an edge (in the inside of the shape). The skeleton connect the midpoints of those edges and the star origins into a tree structure. To blend two shapes the algorithm partitions the shapes into compatible star polygons. The star polygons are then represented as polar coordinates as seen from the star origins. The blending is done by interpolating the different star skeleton points (midpoints, star origins and vertices). After that the polar coordinates are recalculated to the Cartesian grid. Finding compatible star skeletons is a costly process (O(n4)) and the representation is not suitable for internal points interpolation. Skeleton Based Texture Mapping A different use-case is covered by Madaras and Ďurikovič [24]. They use the skeleton of a boundary mesh to texture the mesh independently from its tessellation. For their method they first extract the skeleton and then associate each skeleton edge with a rectangular part in the texture image, by solving a bin-packing problem. The surface points associated with a skeleton edge are addressed with capsular parameterization. Capsular parameterization projects a capsular (i.e. a cylinder with half-spheres on the ends) to the surface. The method only considers the boundary/surface of the shape, although an extension to the interior should be possible. The greater challenge is handling intersections of the projected capsulars at skeleton edge connections. 4.2 US Coordinates As already seen in previous chapters, the (normalized) US vector is invariant to translation, rotation and articulated movement. These properties are per definition interesting to 31 4. Pose Independent Coordinates obtain a PICS. Therefore, in the following sections the possibility of an US-based coordinate system is explored. By considering every point of the skeleton as a point of a Voronoi diagram, each point inside the shape is associated with at least one skeleton point. Furthermore, each point can be seen as lying to the left or right in the direction of the unraveling, so they can also be associated with a position on the US. Figure 4.1 shows the closest point in the shape with respect to the directional Voronoi diagram with skeleton point sites. Figure 4.1: Voronoi Diagram with skeleton points as side, including consideration of side of the skeleton. The first coordinate ϕ is therefore the closest position on the US. Since the US is cyclic and can be mapped to a circle (see Sect. 2.4) ϕ lies between 0 and 2π. It is obvious that the US must be normalized regarding its starting point and orientation (see Sect. 2.5). The second coordinate r is the distance to the skeleton. Unfortunately, with only ϕ and r not all points are uniquely defined as on convex parts of the skeleton or at endpoints one skeleton point has multiple, equidistant points inside the shape associated. Figure 4.2a illustrates the problem for the case of an endpoint. Additionally, articulated movement can introduce/remove problematic areas (Fig. 4.2b). Points that share the same (ϕ, r)-coordinates always lie on a circular arc with the center being the skeleton point. A solution to the problem is the introduction of a third coordinate δ that can take values between zero and one and gives the position on the circular arc. This coordinate is therefore dependent on the two extrema of the circular arc, i.e. it is also dependent on the curvature at a given skeleton point. A motivating factor to use δ is that it keeps its relative position during articulated movement. The coordinate δ can be calculated by interpolation using polar coordinates centered at the skeleton point of the two arc endpoints and the point in question. 32 4.2. US Coordinates (a) Points on the red semicircles have the same distance to the endpoint (blue cross) and same ϕ and r coordinates. (b) Articulation can create new ambiguous points on skeleton’s (red) and shape’s (orange) reflex angles. Figure 4.2: Ambiguous coordinates on endpoints and reflex angles. 4.2.1 Coordinate Conversion To convert the Cartesian point pC = (x, y) inside a shape D to US-coordinates pUS = (ϕ, r, δ), with unraveled skeleton US the following formulas can be used: ϕ(pC) = 2π I(uC) L(US) , r(pC) = d(pC , uC) (4.1) with uC = argmin{do(pC , x) : x ∈ US} and do gives the Euclidean distance under consideration of the orientation of the US (i.e. points that are not to the right of the unraveled skeleton have infinite distance). I(u) gives the index of the point in the US and L(US) is the total length of the US. The offset coordinate δ is more challenging, as it depends on the local geometry around uC . Two approaches to calculate δ are given: the Voronoi-dependent approach and for calculation on the grid the skeleton-dependent approach. For both approaches the following three angles have to be calculated: begin of the circular arc α0, the point on the circular arc β and the end of the circular arc α1 (see Fig. 4.3). Only the relative difference is needed, so α0 = 0. This leads to the normalization δ = β α1 . The Voronoi-dependent approach uses all points associated with uC with distance r and sets α1 to be the difference between the highest and smallest angle. The angle is computed as angular part φ′ of the polar coordinate of all points on the arc, where the polar coordinate system is centered at uC and rotated to have φ′ = 0 in the inverse tangent of point uC . The maximum and minimum φ′ can then be used to normalize all angles between 0 and 1. The Voronoi-dependent approach can lead to lower errors when Voronoi regions change due to noise on the skeleton compared to the skeleton-dependent approach. The skeleton-dependent approach uses the normal vector of the line segments leading to and from uC (n0, n1). The end of the arc angle α1 is based on the dot product: α1 = arccosn0 · n1. The angle β can be calculated similarly by building and normalizing the vector from uC to pC . Because small changes on the discretized skeleton can have 33 4. Pose Independent Coordinates a huge impact on the normal, a stabilization parameter τ is introduced. It states the distance to the next skeleton point used for normal vector calculations. An empirical comparison between skeleton- and Voronoi-dependent δ calculation is given in Sect. 5.3.2. Figure 4.3: Angles needed to calculate δ of blue point. To convert from US-coordinates to Cartesian coordinates, first the corresponding skeleton point uC and orientation can be found using ϕ. The distance r and offset δ′ can then be interpreted as polar coordinates. The polar coordinate interpretation of δ, δ′, can be calculated using the skeleton-dependent or Voronoi-dependent approach. After α1 is obtained with one of these approaches (relative to α0), α0 is set to the angle part of the polar coordinate of the arc start point and finally δ′ = α0 + δ ∗ α1. 4.2.2 Special Point Alignment US coordinates with perfect articulation and articulation-invariant skeleton give a PICS. For real world application the articulation and skeleton will not be ideal and to avoid slight shifts in the US due to slightly changed skeletons a normalization similar to Structure Aware Length Normalization (SALN) (see Sect. 3.4.1) can be used. This only influences coordinate ϕ and gives every skeleton branch two equal ranges of length 1 2nB , where nB is the number of skeleton branches. Note that it is not possible to use a grid-based SALN, as small branches cannot be removed without altering the other coordinates r and δ. It is better to prune the skeleton beforehand. 4.2.3 Deformation and Scale Independence The US coordinates can be made deformation and scale independent by scaling the r parameter. Using scaled radius function rs(pC) = 1− dH(pC , ∂D) dH(pC , ∂D), dH(pC , S(D)) (4.2) 34 4.3. Non-centered Skeletons with dH(p,X) being the Hausdorff distance from point p to set X (i.e. the minimum distance from p to any point of X), the radius coordinate is 0 along the skeleton and 1 along the shape’s boundary. This radius function can easily be calculated using the Distance Transform (DT) on the skeleton’s compliment and the shape. The scaled radius function has influence on the parameter δ. Figure 4.4 shows a contour plot of the coordinate r with and without scaling. Points sharing rs and ϕ do not necessarily lie on a circular arc, but are connected inside the shape and in most cases are convex, so δ can be calculated similarly. Note that the scaled radius function cannot address points outside the shape. (a) Distance coordinate r on a horse shape. Pur- ple contour at 0, yellow at 70. (b) Scaled coordinate rs on a horse shape. Pur- ple contour at 0, yellow at 1 Figure 4.4: The two approaches for non-centered skeletons [20]. 4.3 Non-centered Skeletons As already brought up before, a articulation-invariant skeleton is important for a reliable coordinate system. Furthermore, it is not relevant if the skeleton lies in the center of the object as coordinates are build independent of the centeredness-property. It is therefore necessary to have articulation points (joints) that lie on the skeleton and those points do not lie in the center in many cases (e.g. the spine of a horse is on its back). Langer et al. [20] introduced non-centered Voronoi skeletons, a way to shift the Voronoi skeleton towards the boundary of an object. They use two different ways to achieve this: elliptical distances and weighted DT. The use of non-centered skeletons for the US is possible, but the reconstruction property is lost for the general case. 4.3.1 Elliptical Line Voronoi Diagram and Weighting The first way of off-centering the skeleton is using the special property of the Confocal Ellipse-based Distance (CED) defined between a line segment L with endpoints f1, f2 35 4. Pose Independent Coordinates and a point p: ced(p, L) = d(p, f1) + d(p, f2)− d(f1, f2) (4.3) If considered for a Line Voronoi Diagram (LVD) the area of influence to the side for a given line segment increases with its length. This motivates the introduction of Elliptical Line Voronoi Diagram (ELVD) [13] using the CED instead of Hausdorff Distance. The ELVD of the polygonal approximation of a shape’s contour can therefore modify the position of the skeleton with differently dense sampling of the boundary. The second approach uses multiplicative weighting on the DT of line segments on the polygonal approximation of the boundary. The weight attracts the skeleton towards a line segment, i.e. it decreases the area of influence for a given line segment. Figure 4.5 shows the two approaches. (a) Hand-picked sampling on a horse shape to move skeleton (red) towards anatomical skeleton. The green skeleton is evenly sampled. Interme- diate skeletons for higher sampling. (b) Hand-picked weights on a horse shape to move skeleton (red) towards anatomical skeleton. The green skeleton is unweighted. Intermediate skeletons for increasing weights. Figure 4.5: The two approaches for non-centered skeletons [20]. Langer et al. [20] showed that the skeleton can be moved to any position using these techniques. To obtain a non-centered skeleton of a shape prior knowledge of the object is needed. The position of articulation points could be used for an optimization process to move the skeleton, or the weights could be learned from known examples. 4.4 Applications This section gives some possible applications for a PICS (in particular the US coordinates) and motivates the use of PICSs. This thesis does not explore those applications, but some examples can be found in Chapt. 5. 36 4.4. Applications 4.4.1 Morphing and Animation In research shape based coordinates are used most often for shape morphing and pose independent texturing. In case of morphing interesting coordinates lie on the boundary and approaches optimize intermediate frames for optically convincing transformation. The most critical part for morphing with US coordinates is the morphing of the underlying skeleton. For a simple interpolation of shapes, the skeleton must be structural identical and if not normalized must have the same length on each branch. An example use case is the calculation of an intermediate time frame of a walking animal with appropriate skeleton. Topological changing skeletons need a more sophisticated interpretation of the coordinates and have not been investigated. Pose independent texturing is useful for animation of articulately moving shapes. Together with the US vector as a representation of the shape the US coordinates can be used to model and texture a shape based on a given skeleton. The most difficult part of texturing is handling occlusions. 4.4.2 Super-Resolution A video sequence of a moving (articulated) shape has more information regarding the interior of the shape than a single image. This can be exploited by saving color information into a PICS-space, i.e. a representation of the object expressed in pose independent coordinates. For US coordinates this space is three dimensional (ϕ, r, δ). The projection of a shape to US coordinate space is called US coordinate map. By combining (and interpolating) multiple US coordinate maps the resulting map can be reprojected on to the Cartesian grid. Using this method one can vary the resolution of the reprojection by changing the sampling in the US coordinate space, where more samples will give a higher resolution of the reconstructed image. For this technique to work properly it is important that the US skeleton is invariant to articulated movement. For small changes in the skeleton special point alignment can be sufficient. 4.4.3 Segmentation Improvement Segmentation is the partition of an image into multiple semantically different regions. The default use case is the separation of foreground (object) from background. As stated in Chap. 3 the US assumes a segmentation to obtain the shape and skeleton. Most segmentation approaches are prone to errors along the boundary of regions, as in photographs foreground and background tend to blend into each other at those boundaries (e.g. motion blur). Using the US coordinates along the boundary can help to improve the segmentation in multiple ways, two examples are given. For the first approach, it is assumed that the boundary is smooth. A smooth boundary has similar r coordinates for neighboring bounding points: |r(p1)− r(p2)| < c|ϕ(p1)− 37 4. Pose Independent Coordinates ϕ(p2)|, p1, p2 ∈ ∂D, a small c indicates a smooth boundary. The boundary can then be optimized with regard to c and other information (e.g. color). The benefit of the US is its invariance to articulated movement (among other invariances). It can be exploited if there already exists a (perfect) segmentation, for example from a different frame of a video or segmented by hand. One can use the coordinates along the boundary to compare the known segmentation with the actual one. In the case of multiple weak segmentations the same thinking can motivate the creation of a mean boundary representation by averaging the r coordinate for same ϕ and δ coordinates for all boundary points. 38 CHAPTER 5 Evaluation This chapter presents the evaluation results produced with the Unraveled Skeleton (US). The chapter is divided into three parts: tests on the general representation, evaluation on shape analysis tasks and qualitative results using the Pose Independent Coordinate System (PICS). 5.1 General This section provides evaluation results for Chap. 2. It mainly looks at properties introduced in Sect. 2.7. 5.1.1 Invariances In this section the six invariances theoretically motivated in Sect. 2.7 are empirically evaluated and examples are given. The example shows a running horse with two to four transformed shapes and the US plots are shown as well as a combined plot. All single plots include vertical black line corresponding to special points (branching and endpoints). All computed US vectors use starting point and orientation normalization (Alg. 3.2). For each invariance experiment the mean error between the first and all other US vectors is given. It is relative to the maximum US value, similar to amplitude normalization. Translation Figure 5.1 shows the translation result on the horse shape. It can be observed that for pixel-correct translation the US vector does not change. The mean error to the first shape (red) is zero. 39 5. Evaluation (a) Shape (red) and four different translations. (b) The US vectors of the shapes above. Figure 5.1: Example of translation invariance. Rotation Figure 5.2 shows the rotation result on the horse shape. Minor errors occur due to rasterization. The mean error to the first shape (red) is 0.04. Scale Figure 5.3 shows the scale result on the horse shape. It was simple length normalized and amplitude normalized. Note that non-uniform scaled shapes produce more different US vectors. The magenta shape has a different starting point due to change of the skeleton and its eccentricity. The mean error to the first shape (red) is 0.1, for only the uniformly scaled ones it is 0.02. Flip Figure 5.3 shows the result on the flipped horse shapes. The shape was flipped in the two possible ways. Note that the two US vectors of the flipped shapes are similar, because they are rotations of each other. Figure 5.4c shows the flipped US vector for the flipped shapes. This corresponds to the original one (in a cyclic fashion) and must be re-normalized in regard to the starting point and orientation. 40 5.1. General (a) Shape (red) and four different rotations. (b) The US vectors of the shapes above. Figure 5.2: Example of rotation invariance. (a) Shape (red) and four different scales. Non-uniform scales in yellow and magenta. (b) The US vectors of the shapes above. Figure 5.3: Example of scale invariance. 41 5. Evaluation (a) Shape (red) the two possible flips. (b) The US vectors of the shapes above. (c) Flipped US vectors for flipped shapes. Figure 5.4: Example of flip invariance. Noise Figure 5.5 shows the noise result on the horse shape. The noise was generated by changing the boundary slightly. For this invarince the threshold for the Voronoi skeletonization was increased to avoid spurious branches due to the noise. The mean error to the first shape (red) is 0.07. Articulated Movement Figure 5.6 shows the result on the horse shape under articulated movement. The mean error to the first shape (red) is 0.05. 5.2 Shape Analysis This section presents the result of shape analysis using the US vector. The main focus is on shape retrieval. 42 5.2. Shape Analysis (a) Shape (red) and shape with little and a lot of noise. (b) The US vectors of the shapes above. Figure 5.5: Example of noise invariance. (a) Shape (red) and two articulated movements. Combined image translated for illustration purposes. (b) The US vectors of the shapes above. Figure 5.6: Example of invariance to articulated movement. 43 5. Evaluation 5.2.1 Datasets and Experiments The US vector was tested on three different datasets: Leaves 1, Tools [5, 4] and MPEG72. The Leaves dataset consists of 100 different classes of leaves with 16 shapes per class and has a high inter-class similarity. The Tools dataset consists of 7 classes with 5 shapes each and consists of pliers and scissors that have articulated movement. The MPEG7 dataset consists of 70 classes with 20 shapes each and has been used extensively in research. On each dataset two different experiments were conducted with different parameter settings. Precision-Recall Plot The first experiment uses shape retrieval to generate a precision-recall plot. This test is performed on shape retrieval and gives the precision for every possible recall value. The recall is the number of shapes found of the correct class (TP ) divided by the total number of samples of this class (TP + FP ). The precision is TP divided by the total number of found shapes (TP + FN) [37]. TP relates to the true positive (i.e. correctly in the result), FP to the false positive (i.e. in the result but wrongly so) and FN to the false negative (i.e. not in the result, but should be there). To calculate the plot first all distances between all different pairs of US vectors are calculated. For every shape the minimum number of closest elements is calculated, so this set includes x TP -shapes. This is repeated for every x smaller than the number of shapes in a class. In the end the mean value of the precision for each x over all samples is taken. The Accuracy is calculated by the ratio of in-class results to all results for the closest n− 1 elements, where n is the number of shapes per class. Classification The second experiment uses a k-Nearest-Neighbor (kNN) classifier to obtain a confusion matrix. The kNN predicts a label for a given feature by looking at the closest k samples in the training set (neighbors) and assigns a label by taking a majority vote on the labels of those k neighbors. For every class c the confusion matrix assigns a value to every class depending on the number of predictions for a test sample from c. The results are than presented in a square matrix with a row/column for every class. In the perfect case only the diagonal of the matrix is different from zero. The training set includes 2/3 of each class. Parameters The following parameters were altered: the normalization function norm, the length of the US vector n, the length of a section in Structure Aware Length Normalization (SALN) ns, the grid resolution for grid based SALN ng and the distance function dist. 1https://archive.ics.uci.edu/ml/datasets/One-hundred+plant+species+leaves+ data+set 2http://www.dabi.temple.edu/~shape/MPEG7/dataset.html 44 5.2. Shape Analysis Run norm n ns ng 1 SALN 100 20 - 2 SALN 100 50 - 3 SALN 500 50 - 4 SALN 1000 100 - 5 simple 100 - - 6 simple 500 - - 7 simple 1000 - - 8 grid based SALN 100 - 5 9 grid based SALN 500 - 10 10 grid based SALN 500 - 20 11 grid based SALN 1000 - 10 12 grid based SALN 1000 - 20 13 grid based SALN 1000 - 50 Table 5.1: The different runs for experiments in shape analysis evaluation. Table 5.1 shows the different parameters used for all experiments. Every run in the table uses the following distances: Manhattan, Euclidean, Chebyshev. The minimum length of branches ε was set to 5 for all runs. For all tests a Voronoi skeletonization threshold of 20 was used. 5.2.2 Results Table 5.2 shows the accuracy for different runs and datasets. The accuracies have all been achieved with the Manhattan distance, because it produced the highest accuracies of tested simple distances. Figure 5.7 shows the precision-recall plot of all runs with the Manhattan distance. (a) Leaves dataset. (b) Tools dataset. (c) MPEG7 dataset. Figure 5.7: Precision-Recall plots for the three datasets and the Manhattan distance. The right columns of Table 5.2 use the optimization distance from Sect. 3.4.2. Figure 5.8 shows the precision-recall plot of all runs with the optimization distance. 45 5. Evaluation Manhattan Distance Optimization Run Leaves Tools MPEG7 Leaves Tools MPEG7 1 0.111 0.407 0.237 0.125 0.514 0.266 2 0.114 0.400 0.236 0.128 0.535 0.265 3 0.114 0.414 0.238 0.129 0.529 0.267 4 0.115 0.414 0.239 0.129 0.529 0.268 5 0.150 0.730 0.323 0.185 0.800 0.381 6 0.151 0.721 0.326 0.186 0.829 0.384 7 0.151 0.714 0.325 0.185 0.807 0.384 8 0.130 0.671 0.271 0.139 0.636 0.285 9 0.131 0.586 0.262 0.167 0.743 0.346 10 0.127 0.550 0.254 0.161 0.779 0.320 11 0.135 0.571 0.277 0.184 0.743 0.375 12 0.132 0.593 0.265 0.181 0.850 0.364 13 0.126 0.550 0.255 0.155 0.736 0.337 Table 5.2: Accuracy on different runs and datasets. Best run per dataset is bold. Comparison of Manhattan distance and optimization. (a) Leaves dataset. (b) Tools dataset. (c) MPEG7 dataset. Figure 5.8: Precision-Recall plots for the three datasets and the optimization distance. The kNN did classify all test data correctly on the tools dataset for run 6, 7 and 12. With an accuracy of 32.8% run 12 gave the best result on leaves classification (see Fig. 5.9a) and run 6 (with 65.7%) on MPEG7 classification (see Fig. 5.9b). All results were achieved with the optimization distance. 5.2.3 Discussion It can be observed that the results are highly dependent on the chosen normalization and that the length of branches is an important feature, since the SALN has low scores for every dataset. Additionally, it can be seen that the length of the final vector plays a major role, especially for grid-based SALN. It shows, that the US vector gives a good approximation of the shape for only a few values. 46 5.3. Pose-independent Coordinates (a) Leaves dataset. (b) MPEG7 dataset. Figure 5.9: Confusion matrix of classifying the test set. Based on the superiority of the optimization distance one can conclude, that the starting point normalization methods used can still be improved. Additionally the results on the optimization distance show that length normalization (grid based SALN) can improve the overall results, e.g. by removing small branches. 5.3 Pose-independent Coordinates The US coordinates were qualitatively tested. In this section we compare different methods and parameters for conversion of Cartesian to US coordinates. 5.3.1 US Coordinates on the Grid The biggest challenge is, although continuously well defined, the representation on the grid. The main problem is that color information is only available for discrete points and by changing the skeleton, the new coordinates required are not well aligned with the original ones. There exist different ways of implementing the US coordinates, here two are compared: grid interpolation and retrieval. Grid Interpolation The idea of grid interpolation is to create a 3D grid that covers the whole range of coordinates, i.e. for ϕ between 0 and 2π and for r and δ between 0 and 1. The resolution of the grid has to be of a size such that all information of the shapes interior can be saved. A good start is two times the length of the skeleton in ϕ-direction (∆ϕ) , because every discrete position of the skeleton is covered that way. The maximum value of the radius function rmax on the skeleton (maximum US vector value) is a sensible value for the resolution in r-direction (∆r), since this is the number of possible points between skeleton and shape boundary. To cover a half-circle in δ-direction (∆δ) a resolution of rmaxπ is needed. 47 5. Evaluation This grid will be sparse after filling it with known values from an image. The unknown values can then be interpolated from known values in a two step procedure. At first the δ values are interpolated along the same (ϕ, r) coordinates. Afterwards for every δ value each (ϕ, r) image is interpolated. This yields a 3D lookup image. Retrieval The grid interpolation approach takes up a lot of resources (O(∆ϕ×∆r×∆δ) in memory) and is useful, if multiple poses and re-calculations to Cartesian grid are needed, since color lookups are in linear time. For fewer re-calculations one can use retrieval methods, where the closest coordinate in a 3D-space is looked up for every US coordinate of a different pose. This process can be made more robust by taking the closest k coordinates and taking a weighted average based on the distance. For this a normalization is critical, so that distances are shortest along the δ axis. This approach has a time complexity for color lookups of O(n), where n is the number of points in the 3D-space. δ Binning An additional parameter δB is introduced with δ binning. When sampling distances on the grid, circular arcs of equal distance may get disconnected. This results in coordinates getting the same δ values although they lie on a discrete circular arc. The binning introduces a rounding of r values to a fixed number of possible values. Experiments have shown that the a good value for δB lies close to the biggest value of the radius function. Figure 5.10 shows different δ binning values for a capsule. Note the under representation (gray values) for δB = 5, the gray and light red area results form multiple pixels having the same r component resulting in mean values. For δB = 100 noise is generated on the capsule’s end (lower right side). Delta binning is useful for the interpolation and the retrieval approach. (a) δB = 5 (b) δB = 15 (c) δB = 30 (d) δB = 100 Figure 5.10: Bend capsular with maximum radius 30 and different δB. 5.3.2 Results Figure 5.11 gives a qualitative comparison of interpolation and retrieval. The original object is converted to US coordinates and then mapped to the same and transformed shape. One can see that for small ∆δ the capsule’s ends are not represented correctly, even for the simple reconstruction (note the blurred lines on the capsule’s ends). Another 48 5.3. Pose-independent Coordinates observation is that for the retrieval method the bend down end of the capsule is represented better. For the interpolation method at this region there are artifacts from interpolation of the 3D image. Artifacts of the retrieval method are smoothing for too high values of k, noticeable in the most bottom-right image. Interpolation Retrieval Original Bend-Shape Original Bend-Shape O ri g in a l O ri g in a l Reconstruction Bend Reconstruction Bend ∆ ϕ = 20 0, ∆ r = 30 , ∆ δ = 30 ,δ B = 30 k = 1, δ B = 30 ∆ ϕ = 30 0, ∆ r = 50 , ∆ δ = 50 ,δ B = 50 k = 1, δ B = 50 ∆ ϕ = 40 0, ∆ r = 50 , ∆ δ = 50 ,δ B = 50 k = 5, δ B = 50 Figure 5.11: Comparison of interpolation and retrieval with different resolutions. Length normalization When shapes’ skeletons have more than one branch it is good to normalize ϕ, so all branches have the same resolution. Each branch between special points in the US therefore gets 2π nB where nB is the total number of branches. Figure 5.12 shows the difference between a reconstruction on a changed skeleton with and without length normalization. 49 5. Evaluation (a) Original image (b) No length normalization (c) Length normalization Figure 5.12: Normalization and no normalization of δ using retrieval with k = 3 and δB = 75. Note the incorrect representation on the head. One can see the misalignment of the branches along the centerlines of the shape. (a) Voronoi based on capsule (b) Skeleton based on capsule (c) Voronoi based on horse (d) Skeleton based on horse Figure 5.13: Voronoi and skeleton based calculation of δ using retrieval with k = 1 and δB = 50 on capsule and k = 3 and δB = 75 on horse. The original images are in Fig. 5.11 and Fig. 5.12. 50 5.3. Pose-independent Coordinates δ coordinate calculation Two methods were used for calculation of δ as described in Sect. 4.2.1: Voronoi and skeleton based. Figure 5.13 shows results using the two methods. Note that the skeleton based method looks smoother but introduces gray values that are not present in the original image. This is due to multiple pixels of the original image getting packed into the same coordinate. 5.3.3 Coordinates and US coordinate space images Figure 5.14 shows the different coordinates on the example of the image of a horse. Figure 5.14d shows a slice of the interpolated 3D image at coordinate δ = 0.5. As can be seen, the normalized r coordinate is only valid inside the shape. The δ coordinate is unstable on a slightly different rasterization of the skeleton. (a) ϕ coordinate (b) r coordinate (c) δ coordinate (d) Interpolated image at δ = 0.5. Vertical axis is ϕ, horizontal axis is r. Reformatted for display purposes. Figure 5.14: Voronoi and skeleton based calculation of δ using retrieval with k = 1 and δB = 50 on capsule and k = 3 and δB = 75 on horse. 5.3.4 Non-Centered Skeletons Figure 4.5 gives a comparison between the non-centered and normal skeleton for US coordinates. Note that the non-centered skeleton approach does not change anything that has not moved (left and top branch), but does not interpolate smoothly, because all pixels related to the bend have the same ϕ (as opposed to the normal skeleton, where ϕ values are smoothed due to the inaccurate skeleton at the joint). One can also see that the rightmost shape (the most strongly bent one) introduces new r values for many points using the scale invariant normalization of r based on the boundary of the object. 51 5. Evaluation Figure 5.16 shows an example application transferring color information from one shape to another using US coordinates and non-centered skeletons. (a) Voronoi Skeletons. First row shows shapes and skeletons, second row transformed image with the retrieval method. The leftmost column is the known color image. Skeleton thickness increased for illustration purposes. (b) Non-centered skeletons. First row shows shapes and skeletons, second row transformed image with the retrieval method. The leftmost column is the known color image. Skeleton thickness increased for illustration purposes. Figure 5.15: Comparison of articulation using normal Voronoi skeletons and Non-Centered Voronoi skeletons. 5.3.5 Discussion The biggest - and still unsolved - challenge of the US coordinates is the rasterization of the skeleton in particular in regard to the δ coordinate. The simple checkerboard examples show that for correct articulation the result is acceptable. Slight changes in the skeleton give recognizable results, but of worse quality then the original. 52 5.3. Pose-independent Coordinates (a) Giraffe Image (input) (b) Horse Image (input) (c) Giraffe on Horse Figure 5.16: Transfer of color information from a giraffe to a horse shape1. 1 Giraffe and Horse image under Creative Commons Attribution 3.0 with authors Clémence Delmas and Sheila respectively. Shared via wikimedia.org. 53 CHAPTER 6 Conclusion This thesis introduced a new shape representation based on the unraveled skeleton (US). The properties and applications of the representation were covered and a quantitative and a qualitative evaluation was given. Its invariance to articulated movement is a major advantage of the representation (as shown in Sect. 2.7.1 and Sect. 5.3.4) and is based on the skeleton making the skeletonization the greatest challenge for an adequate representation. In this regard Voronoi skeletons were reviewed and its strengths analyzed, but even the fairly robust Voronoi skeleton can be problematic with articulated shapes. For this problem the non-centered skeletons have been studied. The US was studied in regard to shape analysis and special normalization and optimization strategies were introduced, also in regard to matching only parts of shapes. In addition, the US was used to build a coordinate system based on the skeleton (US coordinates) and therefore independent to articulated movement. The properties of the US look very promising, although more research must be conducted to make the process more general in shape analysis. The normalization methods presented must be refined in regard to scale and local changes, as these are the greatest challenges for the use of the US in shape analysis. The evaluation also shows that normalization can still be improved as optimization can achieve better results. Regarding US coordinates, the biggest challenge are points with ambiguous distance, that has been solved by introducing a new polar coordinate. This coordinate is hard to relate to intuitively (unlike the other two), and prone to errors. For this an elegant solution for the problem must be found, e.g. based on the curvature of distance iso-lines. It can be extremely useful to combine shape analysis and pose-independent coordinates to match objects with there shape and color information. The color information can be accessed with the pose-independent coordinates taking advantage of all its properties. 55 6. Conclusion In general the US has many desirable properties and this thesis gives a first introduction to the benefits and limitations, and how to overcome some of them. 56 Acronyms CED Confocal Ellipse-based Distance. 35, 36 DCD Discrete Curve Evolution. 6 DT Distance Transform. 11, 12, 15, 35, 36 ELVD Elliptical Line Voronoi Diagram. 36 ET Eccentricity Transform. 13, 14 GED Graph Edit Distance. 22 kNN k-Nearest-Neighbor. 44, 46 LVD Line Voronoi Diagram. 7, 36 MAT Medial Axis Transform. 4, 10, 20, 21 PICS Pose Independent Coordinate System. 29, 31, 32, 34, 36, 37, 39 SALN Structure Aware Length Normalization. 24–26, 34, 44–47 SG Skeleton Graph. 21, 22 US Unraveled Skeleton. xiii, 1, 2, 11–17, 19–29, 31–44, 46–49, 51, 52, 55, 56 VD Voronoi Diagram. 6–8 57 Bibliography [1] Xiang Bai, Longin Jan Latecki, and Wen-Yu Liu. Skeleton pruning by contour partitioning with discrete curve evolution. IEEE transactions on pattern analysis and machine intelligence, 29(3):449–462, 2007. [2] Thomas Bernier and Jacques-Andre Landry. A new method for representing and matching shapes of natural objects. Pattern Recognition, 36(8):1711–1723, 2003. [3] Harry Blum et al. A transformation for extracting new descriptors of shape. Models for the perception of speech and visual form, 19(5):362–380, 1967. [4] Alexander M Bronstein, Michael M Bronstein, Alfred M Bruckstein, and Ron Kimmel. Analysis of two-dimensional non-rigid shapes. International Journal of Computer Vision, 78(1):67–88, 2008. [5] Alexander M Bronstein, Michael M Bronstein, and Ron Kimmel. Numerical geometry of non-rigid shapes. Springer Science & Business Media, 2008. [6] Fred Buckley and Frank Harary. Distance in graphs. Addison-Wesley, 1990. [7] Luciano da Fontoura Da Costa and Roberto Marcondes Cesar Jr. Shape analysis and classification: theory and practice. CRC Press, Inc., 2000. [8] Cecilia Di Ruberto. Recognition of shapes by attributed skeletal graphs. Pattern Recognition, 37(1):21–31, 2004. [9] Michael Donoser, Hayko Riemenschneider, and Horst Bischof. Efficient partial shape matching of outer contours. In Asian Conference on Computer Vision, pages 281–292. Springer, 2009. [10] Michael S Floater. Mean value coordinates. Computer aided geometric design, 20(1):19–27, 2003. [11] Michael S Floater. Generalized barycentric coordinates and applications. Acta Numerica, 24:161–214, 2015. [12] Herbert Freeman. On the encoding of arbitrary geometric configurations. IRE Transactions on Electronic Computers, 10(2):260–268, 1961. 59 [13] Aysylu Gabdulkhakova, Maximilian Langer, Bernhard W Langer, and Walter G Kropatsch. Line voronoi diagrams using elliptical distances. In Joint IAPR Inter- national Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), 2018. [14] Raghuraman Gopalan, Pavan Turaga, and Rama Chellappa. Articulation-invariant representation of non-planar shapes. In European Conference on Computer Vision, pages 286–299. Springer, 2010. [15] Richard W Hall. Parallel connectivity-preserving thinning algorithms. In Machine Intelligence and Pattern Recognition, volume 19, pages 145–179. Elsevier, 1996. [16] Ines Janusch, Nicole M Artner, and Walter G Kropatsch. Euclidean and geodesic distance profiles. In International Conference on Discrete Geometry for Computer Imagery, pages 307–318. Springer, 2017. [17] Reinhard Klette. Concise computer vision. Springer, 2014. [18] Reinhard Klette and Azriel Rosenfeld. Digital geometry: Geometric methods for digital picture analysis. Elsevier, 2004. [19] Walter G Kropatsch, Adrian Ion, Yll Haxhimusa, and Thomas Flanitzer. The eccentricity transform (of a digital shape). In International Conference on Discrete Geometry for Computer Imagery, volume 4245, pages 437–448. Springer, 2006. [20] Maximilian Langer, Aysylu Gabdulkhakova, and Walter G Kropatsch. Non-centered voronoi skeletons. In International Conference on Discrete Geometry for Computer Imagery, volume 11414, pages 355–366. Springer, 2019. [21] Der-Tsai Lee. Medial axis transformation of a planar shape. IEEE Transactions on Pattern Analysis & Machine Intelligence, 4(4):363–369, 1982. [22] Haibin Ling and David W Jacobs. Shape classification using the inner-distance. IEEE transactions on pattern analysis and machine intelligence, 29(2):286–299, 2007. [23] Yaron Lipman, David Levin, and Daniel Cohen-Or. Green coordinates. ACM Transactions on Graphics (TOG), 27(3):78, 2008. [24] Martin Madaras and Roman Ďurikovič. Skeleton texture mapping. In Proceedings of the 28th Spring Conference on computer Graphics, pages 121–127. ACM, 2013. [25] Robert Ogniewicz and Markus Ilg. Voronoi skeletons: Theory and applications. In Computer Vision and Pattern Recognition, 1992. Proceedings CVPR’92., 1992 IEEE Computer Society Conference on, pages 63–69. IEEE, 1992. [26] Kaspar Riesen. Structural Pattern Recognition with Graph Edit Distance: approx- imation algorithms and applications. Advances in computer vision and pattern recognition. Springer, 2016. 60 [27] Punam K Saha, Gunilla Borgefors, and Gabriella Sanniti di Baja. A survey on skeletonization algorithms and their applications. Pattern Recognition Letters, 76:3–12, 2016. [28] Thomas B Sebastian, Philip N Klein, and Benjamin B Kimia. Recognition of shapes by editing their shock graphs. IEEE Transactions on Pattern Analysis & Machine Intelligence, 26(5):550–571, 2004. [29] Michal Shapira and Ari Rappoport. Shape blending using the star-skeleton repre- sentation. IEEE Computer Graphics and Applications, 15(2):44–50, 1995. [30] Frank Y Shih. Image processing and pattern recognition: fundamentals and techniques. John Wiley & Sons, 2010. [31] Kaleem Siddiqi, Ali Shokoufandeh, Sven J Dickinson, and Steven W Zucker. Shock graphs and shape matching. International Journal of Computer Vision, 35(1):13–32, 1999. [32] Sergios Theodoridis and Konstantinos Koutroumbas. Pattern recognition. Acad. Pr., Amsterdam [u.a.], 4. ed., 3. [print.] edition, 2010. [33] Petrus Johannes Van Otterloo. A contour-oriented approach to digital shape analysis. PhD thesis, Delft University of Technology, 1988. [34] Ofir Weber, Mirela Ben-Chen, Craig Gotsman, et al. Complex barycentric coordinates with applications to planar shape deformation. In Computer Graphics Forum, volume 28, page 587, 2009. [35] Mingqiang Yang, Kidiyo Kpalma, and Joseph Ronsin. A survey of shape feature extraction techniques, pages 43–90. In-Tech, 2008. [36] Hamidreza Zaboli, Mohammad Rahmati, and Abdolreza Mirzaei. Shape recognition by clustering and matching of skeletons. Journal of computers, 3(5):24–33, 2008. [37] Dengsheng Zhang and Guojun Lu. A comparative study of curvature scale space and fourier descriptors for shape-based image retrieval. Journal of Visual Communication and Image Representation, 14(1):39–57, 2003. [38] Dengsheng Zhang and Guojun Lu. Review of shape representation and description techniques. Pattern recognition, 37(1):1–19, 2004. [39] TY Zhang and Ching Y Suen. A fast parallel algorithm for thinning digital patterns. Communications of the ACM, 27(3):236–239, 1984. 61