A Graph Centrality Approach to
Computer Vision
DISSERTATION
submitted in partial fulfillment of the requirements for the degree of
Doctor of Technical Sciences
within the
Vienna PhD School of Informatics
by
Samuel Felix de Sousa Junior
Registration Number 1228089
to the Faculty of Informatics
at the Vienna University of Technology
Advisor: o. Univ.-Prof. Dipl. Ing. Dr. Walter G. Kropatsch
External reviewers:
Dr. Dennis Strelow. Google Inc., Mountain View, California, United States of America.
Prof. Francesc Serratosa. Universitat Rovira i Virgili, Tarragona, Catalonia, Spain.
Vienna, 17th August, 2015
Samuel Felix de Sousa Junior Walter G. Kropatsch
Technische Universität Wien
A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at
Die approbierte Originalversion dieser 
Dissertation ist in der Hauptbibliothek der 
Technischen Universität Wien  aufgestellt und 
zugänglich. 
http://www.ub.tuwien.ac.at 
 
 
The approved original version of this thesis is 
available at the main library of the Vienna 
University of Technology.  
 
http://www.ub.tuwien.ac.at/eng 
 

Declaration of Authorship
Samuel Felix de Sousa Junior
Favoritenstraße 9/2/186-3, 1040, Wien, Österreich
I hereby declare that I have written this Doctoral Thesis independently, that I have
completely specified the utilized sources and resources and that I have definitely marked
all parts of the work - including tables, maps and figures - which belong to other works
or to the internet, literally or extracted, by referencing the source as borrowed.
Vienna, 17th August, 2015
Samuel Felix de Sousa Junior
iii

Acknowledgements
All doctors, from the newest graduate until the tenured professor cultivate the dream of
impacting the world with their research. Their research becomes part of themselves and
as such it would lead them to find the hidden treasures in science. This is the ambition
of every new pupil trying to master its way through the scientific world. Throughout
time, they realize that they are nothing but a small piece of a gigantic jigsaw puzzle, a
piece which would not make sense if it was not by the other pieces giving meaning to the
whole. My doctoral thesis was only possible because others supported me in this journey
and I thank them for coming along this whole time.
I dedicate this work to my family, none of this could ever have existed without them.
I thank my parents Silsa and Samuel and my sister Samantha for teaching me valuable
lessons which I carry with me every day. My father has always been the dreamer of the
family and my mother the curbstone which kept us united. My family supported my
dreams and helped me fixing my hope on a better future.
Education makes it impossible to go back to the primitive ways, when we would be
satisfied with panem et circenses. I thank all the ones who taught me something, from
the most complex and deepest theorem to the simplest and most humble tip. I thank my
Doktorvater, prof. Walter Kropatsch for welcoming me at PRIP during these years in
Austria, for being always present whenever I needed, for all the discussions and paper
reviews. It is truly a privilege for a student to be able to talk to his supervisor whenever
he needs. I also thank all the friends I made at PRIP, Ines and Michi, who always taught
me the Austrian way of doing things, the dialects, and constant motivation to teach me
Wienerisch, danke sehr ! I also would like to thank my first supervisor prof. Cleidson
de Souza, who introduced me to the field of Social Networks almost ten years ago, even
though I diverted to Computer Vision, it turns out I found a way of combining both
fields.
My internships at Google and Siemens gave me real industrial experience, but they
gave me much more than that. I owe Jan Ernst a great deal, since it was him who
showed me how to structure myself better in order to tackle complicated problems, he
also introduced me to the beautiful field of machine learning. I learned many life lessons
with Dennis Strelow and I am deeply grateful for his guidance, advice, and friendship.
He always paid attention to my ideas, even the ones which did not make much sense,
showing me how one can exercise patience in real life.
Finally, there are friends who are closer than brothers. I remember how happy Joanna
Zysiak was when I first told her that I would be back in Europe for my PhD. I have
v
no words to express the gratitude for her friendship, I am just lucky to have found her.
Marina Oikawa, Luana Leite, and Séverine Habert shared many moments of my life and
supported me before, during, and I believe after my PhD. They are friends who stick for
life even though we have been far away from each other during all these years. Luana
had to personally validate all my documents in the Brazilian Ministry of Foreign Affairs
so I could start my PhD.
I met many great people in Austria, which made my life here more pleasant than I
could ever have imagined, Magnus Köhler and Paul Wandl are my closest friends here
and they were always there for me whenever I needed or struggled with the German
language. When I think of Christina Lengauer, the first thing that comes to my mind
is her beautiful smile and delicious cakes which made me very happy throughout my
PhD. Finally, at last but not least, I owe many thanks to Lucas Gerrand, who just like
me, came to Austria from a very sunny place in the opposite hemisphere. He proof read
many of my papers and was always happy to do so. During my studies, I was supported
by the PhD School of Informatics and I thank the organizers of the program for giving
me this opportunity. Special thanks go to Clarissa Schmid and Mamen for being so kind
and answering all my questions, and there were many of them.
Kurzfassung
Das maschinelles Sehen hat die Art und Weise, wie wir mit Computern interagieren, von
Grund auf verändert; von der ersten automatischen Gesichtserkennung bis hin zu den
neuesten Augmented-Reality Anwendungen hat es eine komplett neue Ära eingeläutet. Die
Analyse von sozialen Netzwerken, die in den letzten Jahren populär geworden sind, basiert
auf Graph-basierten Konzepten. Die Zentralität ist eines davon. Die Zentralität bestimmt,
wie wichtig ein bestimmter Knoten innerhalb eines Netzwerks ist. In dieser Dissertation
zeigen wir Anwendungen des Konzepts der Zentralität für die Lösung mehrerer Probleme,
die im Bereich der Computer Vision auftreten und untersuchen, wie gut die Zentralität
in dieser neuen Domäne angewendet werden kann.
Das erste Problem, das wir untersuchen, ist die Registrierung einer Punktwolke,
welches immer dann auftritt, wenn Feature-Punkte, Orientierungspunkte oder Punkte,
die die Oberfläche eines Objekts repräsentieren, zwischen zwei oder mehreren Bildern
bestmöglich in Übereinstimmung gebracht werden müssen. Dazu führen wir in einem
ersten Schritt die Zentralität in den Coherent Point-Drift Algorithmus ein und analysieren,
wie sich dieser unter verschiedenen Szenarios verhält.
Im weiteren Verlauf dieser Dissertation stellen wir unseren eigenen, auf der Zentrali-
tät basierenden Registrierungs-Algorithmus vor, dessen primäre Aufgabe es ist, einen
unterscheidbaren Graphen zu erzeugen, dessen Verteilung der Zentralität möglichst gleich-
mäßig ist, um die Registrierung zu erleichtern. Auch das Erstellen von Diagrammen
ist der Registrierung einer Punktwolke sehr ähnlich: in beiden Fällen soll ein Graph
aus unstrukturierten Daten erstellt werden. Wir untersuchen die aktuellen Techniken
für das Lösen dieses Problem unter besonderer Beachtung der Delaunay-Triangulation,
einem gebräuchlichen Verfahren, um aus einer Punktemenge ein Dreiecksnetz zu er-
stellen. Wir präsentieren unsere stabile Graphen-Generierung, die versucht, mit Hilfe
eines Optimierungsverfahrens einen minimalen Spannbaum von maximaler Entropie zu
erhalten. Wir evaluieren die Robustheit der generierten Graphen unter Berücksichtigung
von unterschiedlichen Mengen von Rauschen in der Nachbarschaft von Punkten und
vergleichen die Stabilität der Ergebnisse mit denen der Delaunay-Triangulation.
Der letzte Aspekt, den wir in dieser Dissertation untersuchen, ist der Einfluss der
Zentralität für Verfahren des maschinellen Lernens. Dazu beschreiben wir zunächst die
verschiedenen Möglichkeiten, wie mit unserem Verfahren Merkmalsvektoren modelliert
werden können, die statt der Pixel des Bildes die Zentralität der entsprechenden Kno-
ten verwenden. Danach trainieren wir Algorithmen des maschinellen Lernens auf der
Grundlage dieser Zentralität-basierten Merkmalsvektoren, um eine Klassifizierung von
vii
Formen vornehmen zu können. Wir stören die Topologie des Graphen mit unterschied-
lichen Mengen an Rauschen und bewerten, wie gut unsere trainierten Klassifikatoren
abschneiden.
Diese Dissertation behandelt erstmalig die Verwendung der Zentralität eines Graphen
zur Lösung einer Vielzahl von Problemen im Bereich des maschinellen Sehens. Das Ziel
dabei soll nicht nur sein, die Zentralität als neues Verfahren zu etablieren, sondern auch
zu untersuchen, wie effektiv sie bei der Lösung der Probleme im Bereich des maschinellen
Sehens ist und abzuschätzen, wann und wo ein bestimmtes Zentralitäts-Maß am besten
abschneidet.
Abstract
Computer vision has changed the way we deal with computers, since the first automatic
face detection algorithm to the latest augmented reality application, brought us into
a new era of computing. The study of social networks models human relations using
graph theory and it has been emerging in the recent years. This field designs the concept
of centrality by defining how important a certain node is within a network. In this
dissertation, we study the centrality concept applied to computer vision and show how
this information can be used in such a different domain.
We start by analyzing the point-set registration problem. This problem often arises in
computer vision whenever one needs to match information available in images. The kind of
information to be matched consists of feature locations, landmarks, or points representing
a surface of an object. We first introduce various centrality measures into the Coherent
Point-Drift algorithm and analyze their behavior under different scenarios. Later, we
propose our own centrality-oriented matching algorithm, focusing on the generation of a
distinguishable data-graph. We aim at obtaining a distribution of centrality as uniform
as possible, thus, alleviating the matching process.
The construction of the data-graph is also related to the registration of point-sets,
whose goal is to build a graph from unstructured points. We review the current techniques
for this problem, paying important attention to the Delaunay algorithm, an off-the-shelf
solution to build a triangulation of points. We pose our stable data-graph generation
using an optimization formulation whose goal is to obtain the minimum spanning tree
of maximum entropy. We evaluate the robustness of the generated data-graph in the
presence of different amounts of noise in the local neighborhood of points and compare
the stability of the results with Delaunay.
The last aspect studied in this dissertation is the impact of centrality in a machine
learning scenario. First, we describe different approaches to model feature vectors using
the centrality of nodes instead of pixels. Later, we train machine learning algorithms
using those centrality-based features focusing on the classification of shapes. We evaluate
our approach by perturbing the topology of the graph using a different amount of noise
and evaluate how well each trained classifier performs.
This dissertation introduces graph centralities in a variety of computer vision problems
for the first time. Our goal is not only to apply those measures, but to understand them,
study their behavior in the different vision problems, and to estimate when a certain
centrality measure can perform better than another.
ix

Resumo
A visão computacional mudou a forma que lidamos com a computação, desde a primeira
detecção automática de faces até o algoritmo de realidade aumentada mais avançado. O
estudo das redes sociais modela as relações humanas usando teoria de grafos e tem recebido
grande atenção nos últimos anos. Neste campo, existe uma medida chamada centralidade
que define a importância de um nó dentro de um grafo. Nesta tese, centralidades
são aplicadas pela primeira vez em uma grande diversidade de problemas de visão
computacional e demonstraremos como utilizar esta medida em um domínio tão diferente.
Estudamos o registro de nuvem de pontos, que surge frequentemente em visão quando
é necessário encontrar a correspondência de informações disponíveis em imagens, tais
como locais de saliência ou outros pontos representando a superfície de um objeto.
Introduzimos as diversas medidas de centralidade no algoritmo Coherent Point Drift
e analisamos os seus comportamentos em diferentes cenários. Propomos, também, um
novo algoritmo para a correspondência de pontos baseado nas centralidades que foca
na geração de um grafo distinto. O objetivo é obter uma distribuição de centralidade
que seja próxima de uma distribuição uniforme, de forma a amenizar o problema da
correspondência.
A construção do grafo espacial também está relacionada ao registro de nuvens de
pontos, cujo objetivo é construir um grafo a partir de pontos não estruturados. Revisamos
as técnicas atuais para este problema, focando especialmente no algoritmo de Delaunay,
que é uma solução padrão para construir uma triangulação de pontos. Nós modelamos
a geração de um grafo espacial usando uma formulação de otimização, cujo objetivo é
obter uma árvore geradora mínima de entropia máxima.
Avaliamos a robustez dos grafos gerados na presença de diversos níveis de perturbações
nas vizinhanças dos pontos e comparamos os resultados com o algoritmo de Delaunay.
O impacto da centralidade de grafos em aprendizado de máquina é o último aspecto
abordado nesta tese. Primeiramente, descrevemos várias maneiras para modelar os vetores
de características usando as centralidades ao invés dos pixels de imagens. Posteriormente,
treinamos algoritmos de aprendizado baseados nos vetores características obtidos pelas
centralidades focando no problema de classificação de formas. Perturbamos as topologias
dos grafos usando vários níveis de ruído e avaliamos a performance dos algoritmos
treinados. Esta tese introduz as centralidades de grafo em uma variedade de problemas
de visão computacional pela primeira vez. O objetivo não é apenas aplicar tais medidas,
mas entender os seus comportamentos e estimar quando uma medida de centralidade
poderá obter melhor performance em um cenário específico.
xi

Contents
Kurzfassung vii
Abstract ix
Resumo xii
Contents xiii
List of Figures xvi
List of Tables xviii
List of Algorithms xix
List of Theorems xx
1 Introduction 1
1.1 On the Computer Vision Road . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 A glimpse on registration . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Breaking an image into pieces . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Separating the wheat from the chaff . . . . . . . . . . . . . . . . . 6
1.1.4 Playing the game with graphs . . . . . . . . . . . . . . . . . . . . . 7
1.2 About Relations and Importance . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Goals and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Roadmap of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Graph Centrality and Discrete Geometry 13
2.1 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 The Minimum Spanning Tree . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Paths and Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.3 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.4 Images as Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
xiii
2.2.1 Generalized Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Distances and Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.3 Distances in Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 The Degree Centrality . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.2 The Betweenness Centrality . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 The Closeness Centrality . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.4 The Eigenvector Centrality . . . . . . . . . . . . . . . . . . . . . . 29
2.3.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Correspondence and Registration 33
3.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.1 Rigid Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.2 Similarity Transformation . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.3 Affine Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Iterative Closest Point Algorithm (ICP) . . . . . . . . . . . . . . . . . . . 36
3.3 Coherent Point Drift (CPD) . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Expectation Maximization Methods . . . . . . . . . . . . . . . . . . . . . 38
3.5 Spectrum-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 On Centrality-based Learning 41
4.1 Shape Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Graph-based Shape Matching . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Centrality-based Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.1 Centrality of Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.2 Histogram feature . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.3 Random Binary Comparison of Centrality Values . . . . . . . . . . 49
4.4 Training Centrality-based Features . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 Graph-based Point Drift 57
5.1 Computing Probabilities using Centralities . . . . . . . . . . . . . . . . . . 57
5.2 Combining Centrality and Spatial Information . . . . . . . . . . . . . . . 60
5.3 Relation between GPD and CPD . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.1 Registration under Similarity Transformation . . . . . . . . . . . . 64
5.4.2 Registration under Affine Transformation with Missing Points . . . 66
5.4.3 Non-Rigid Registration with Noise and Missing Points . . . . . . . 67
5.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4.5 Experiments on Different Classes of Point-sets . . . . . . . . . . . 70
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 The Minimum-Weight Maximum-Entropy Problem 75
6.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2 The Near Homogeneous Degree Distribution . . . . . . . . . . . . . . . . . 78
6.2.1 Graphs fulfilling NHDD . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3.1 The n1graph algorithm explained . . . . . . . . . . . . . . . . . . 83
6.3.2 The traceback algorithm . . . . . . . . . . . . . . . . . . . . . . 86
6.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.5 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7 The Minimum Spanning Tree of Maximum Entropy 91
7.1 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8 Conclusions 99
8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.2 Open Problems and Future Direction . . . . . . . . . . . . . . . . . . . . . 101
A Licenses for Copyrighted Material 103
Bibliography 119
Index 131
Acronyms 133
Notation 136
List of Figures
1.1 Scene Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Example of Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Example of Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Example of Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Graph Representation of a Shape . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 The seven bridges of Königsberg . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 A directed graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Induced subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 The Minimum Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Data structures for Graph Representation . . . . . . . . . . . . . . . . . . . . 19
2.6 Representing images as graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Distance between two cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.8 Triangle Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Relaxation of the metric axioms. . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.10 Metric Balls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.11 Geodesics in a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.12 A toy example for graph centralities . . . . . . . . . . . . . . . . . . . . . . . 30
3.1 Rigid transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Similarity Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Affine transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1 Images from the Kimia 99 Database . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Centrality distribution of graphs . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Comparison of betweenness centrality histograms . . . . . . . . . . . . . . . . 47
4.4 Comparison of closeness centrality histograms . . . . . . . . . . . . . . . . . . 48
4.5 Comparison of degree centrality histograms . . . . . . . . . . . . . . . . . . . 48
4.6 Comparison of eigenvector centrality histograms . . . . . . . . . . . . . . . . 49
4.7 Binary Comparison of Centrality Values . . . . . . . . . . . . . . . . . . . . . 50
4.8 Confusion Matrices for the Centrality-based Learning. . . . . . . . . . . . . 55
5.1 Centrality of a Delaunay Triangulation . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Histogram of Closeness Centrality . . . . . . . . . . . . . . . . . . . . . . . . 59
xvi
5.3 Convergence of the registration under similarity transformation . . . . . . . . 64
5.4 Registration under similarity transformation . . . . . . . . . . . . . . . . . . . 65
5.5 Registration under affine transformation . . . . . . . . . . . . . . . . . . . . . 66
5.6 Convergence of the registration under affine registration . . . . . . . . . . . . 67
5.7 Registration under non-rigid transformation . . . . . . . . . . . . . . . . . . . 68
5.8 Convergence of the registration under nonrigid transformation . . . . . . . . 69
5.9 Stability of Delaunay Triangulation . . . . . . . . . . . . . . . . . . . . . . . . 69
5.10 Sample point-sets of 495 database used . . . . . . . . . . . . . . . . . . . . . 70
6.1 Data graphs of a point-set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 The Near Homogeneous Degree Distribution (NHDD) condition. . . . . . . . 81
6.3 N1graph Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Registration of point-sets under Similarity Transformation . . . . . . . . . . . 88
7.1 Minimum Spanning Tree of Maximum Entropy . . . . . . . . . . . . . . . . . 93
7.2 Data graphs connected using the MSTME . . . . . . . . . . . . . . . . . . . . 95
7.3 Noise added to the point-set. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.4 Comparison of the stability of the fish dataset . . . . . . . . . . . . . . . . . . 96
7.5 Examples of data-graphs with noise equal to 3ε . . . . . . . . . . . . . . . . . 97
List of Tables
2.1 Centrality values for a toy example . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Classification results using different centrality measures . . . . . . . . . . . . 54
5.1 Average on the registration of 495 datasets . . . . . . . . . . . . . . . . . . . 71
5.2 Standard Deviation of the registration of 495 datasets. . . . . . . . . . . . . . 71
5.3 Percentage of convergence on 495 point-sets under Rigid transformation . . . 72
6.1 The table storing the cost during optimization . . . . . . . . . . . . . . . . . 87
xviii
List of Algorithms
4.1 Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.1 The nhdd algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2 The n1graph algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.3 The traceback algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.1 The mstme algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
xix
List of Theorems
Chapter 2
1 Definition – Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Definition – Simple Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Definition – Directed Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Definition – Weighted Graph . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Definition – Vertex-induced subgraph . . . . . . . . . . . . . . . . . . . . . 16
6 Definition – Edge-induced subgraph . . . . . . . . . . . . . . . . . . . . . 16
7 Definition – Minimum Spanning Tree . . . . . . . . . . . . . . . . . . . . . 17
8 Definition – Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
9 Definition – Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10 Definition – Length of a Path . . . . . . . . . . . . . . . . . . . . . . . . . 18
11 Definition – Handshake Lemma . . . . . . . . . . . . . . . . . . . . . . . . 18
12 Definition – Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
13 Definition – Quasimetric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
14 Definition – Semimetric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
15 Definition – Pseudometric . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
16 Definition – Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . 23
17 Definition – Manhattan Distance . . . . . . . . . . . . . . . . . . . . . . . 24
18 Definition – Chevycheb distance . . . . . . . . . . . . . . . . . . . . . . . 24
19 Definition – Geodesic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
20 Definition – Geodesic Distance . . . . . . . . . . . . . . . . . . . . . . . . 25
21 Definition – Eccentricity of a node . . . . . . . . . . . . . . . . . . . . . . 25
22 Definition – Eccentricity of a graph . . . . . . . . . . . . . . . . . . . . . . 26
23 Definition – Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
24 Definition – Degree Centrality . . . . . . . . . . . . . . . . . . . . . . . . . 27
25 Definition – Betweenness Centrality . . . . . . . . . . . . . . . . . . . . . 28
26 Definition – The Closeness Centrality . . . . . . . . . . . . . . . . . . . . 29
27 Definition – Eigenvector Centrality . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 3
28 Definition – Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
29 Definition – Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
30 Definition – Rigid Transformation . . . . . . . . . . . . . . . . . . . . . . 34
31 Definition – Similarity Transformation . . . . . . . . . . . . . . . . . . . . 35
xx
32 Definition – Affine Transformation . . . . . . . . . . . . . . . . . . . . . . 36
Chapter 4
33 Definition – Shape Matching . . . . . . . . . . . . . . . . . . . . . . . . . 42
34 Definition – Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
35 Definition – Shallow Learning . . . . . . . . . . . . . . . . . . . . . . . . . 51
36 Definition – Training Algorithm for Classification . . . . . . . . . . . . . . 51
Chapter 5
1 Theorem – Relation between GPD and CPD . . . . . . . . . . . . . . . . 61
1 Lemma – Centrality values of a complete graph . . . . . . . . . . . . . . . 62
Chapter 6
37 Definition – The Minimum-Weight Maximum-Entropy Problem . . . . . . 77
2 Theorem – Maximum Number of Distinct Degree Values . . . . . . . . . . 78
38 Definition – Near Homogeneous Degree Distribution . . . . . . . . . . . . 78
39 Definition – Types of NHDD Graphs . . . . . . . . . . . . . . . . . . . . . 79
2 Lemma – The complement graph fulfills the NHDD condition . . . . . . . 79
3 Lemma – Construction of graphs fulling NHDD . . . . . . . . . . . . . . . 79
3 Theorem – Connected Graph fulfilling NHDD . . . . . . . . . . . . . . . . 80
4 Lemma – The repeated node . . . . . . . . . . . . . . . . . . . . . . . . . 80
4 Theorem – The number of edges in a NHDD graph . . . . . . . . . . . . . 82
5 Theorem – Maximum Entropy in a Graph . . . . . . . . . . . . . . . . . . 83
6 Theorem – The entropy of a NHDD graph . . . . . . . . . . . . . . . . . . 83
7 Theorem – The complexity of the nhdd algorithm . . . . . . . . . . . . . 87
8 Theorem – The complexity of the n1graph algorithm. . . . . . . . . . . . 88
9 Theorem – The complexity of the traceback algorithm . . . . . . . . . 88

CHAPTER 1
Introduction
“Who can tell that a machine can not learn? Just
because it does it in a different way than you do, it
does not mean it is not learning.”
– Unknown
Many fields in science were inspired by the desire of building machines capable
of thinking and perceiving the world as we do. Alan Turing himself envisioned that
computers could go beyond simple arithmetic. He was puzzled by the question if machines
could ever think when he proposed the Imitation Game [121]. We are not going to dwell
on such a question here, but it is worth-mentioning that Artificial Intelligence (AI) was
born to ultimately develop machines as intelligent as humans, to create the artificial
brain. In the early 1960s, researchers jumped on modeling an artificial neural network
(the perceptron algorithm [100]) and later (1974) the back-propagation [127] algorithm
in order to emulate brain activity. One could say that it was the birth of the Machine
Learning (ML) field.
At some point in the 1990s, machine learning decoupled from artificial intelligence
as it aimed at solving less ambitious problems compared with the artificial brain, such as
recognizing digits from images. Some of those problems could actually be solved given a
sufficient amount of data and a suitable learning algorithm. This new direction focused
at looking at the data hoping that the data itself would help the researchers solve the
problem. The goal was not only to find out what was happening in the problem but how
one could extrapolate beyond what has been seen and make inference about unpredicted
behavior. At this point in time, many techniques were borrowed from statistics and
probability theory [67] and incorporated into machine learning. A milestone occurred
when the Support Vector Machines (SVM) [23] algorithm was proposed by finding a linear
separator between classes (the support vector) and later allowing a non-linear separation
by using kernel methods. Machine learning continued evolving and more recent research
indicates that ML is coming back towards its original artificial intelligence roots with the
1
return of neural networks in deep architectures (Deep Learning) which is breaking brand
new records not only in digit [49], but also in facial [116] and speech [54] recognition at
an impressive rate like never seen before.
The artificial brain was not the only dream that scientists had, but also envisioned the
development of artificial eyes and thus, giving the machines the ability to see and think.
This was the time when the first digital images were produced and the ground was set
for the beginning of new fields of digital images. Due to the newspaper industry, digital
images started being transmitted via a submarine cable1 in the early 1920s between
London and New York. We could consider this as the birth of the image processing field
since it was important to start developing methods to improve the printing procedures.
For instance, the number of gray tones displayed increased from 5 (in the initial Bartlane
transmission) to 15 gray tones in 1929 [48]. The goal of image processing was to develop
algorithms that would “process” an input image in order to generate output images
enhancing or modifying certain aspects. Research in Computer Vision (CV) started
flourishing at this time, and simple operators (such as the Sobel operator to compute
derivatives) were proposed to extract information from images. Images started not only
to display a moment for the viewer’s appreciation, but they were telling us things about
it, e.g. curvatures, high gradient information, salient points.
As time passed and technology evolved, we started to consolidate fields which would
work together to build a comprehensive understanding of the scene. Figure 1.1 shows
the relationship between different fields related to digital images. A scene is imaged
by a sensor (by a process called acquisition), which could happen in many ways, by an
ordinary camera, or using more sophisticated depth devices, range sensors, etc. When
the image passes through this low-level processing (denoising, filtering, etc.), we could
say it belongs to the Digital Image Processing (DIP) field, whose input and output are
composed of images. The output would be an image with a specific property enhanced.
This includes operators such as sharpening, brightening, edge detection, etc. We are
entering the CV field when the goal is to estimate the models projected onto that image.
There is a vast literature on reconstruction of objects from one, two (stereo), or multiple
views (trifocal tensor, bundle adjustment, etc). Once models are acquired, they could be
rendered for other purposes (movies, games, augmented reality) which is the responsibility
of Computer Graphics (CG), as they are the ones rendering realistic scenes. Finally, we
close the gap of image understanding when given all the knowledge we gathered from the
image, we are able to tell what is happening in the scene, i.e. there are two trees and a
country house. This is considered high level vision when semantics are involved. We can
achieve such results by bringing the ML field into the game with its powerful algorithms.
Computer vision is the field we are mostly interested in, as it is the one which combines
both learning and image processing in order to emulate the human vision, whose ultimate
goal is not only to observe or reconstruct a scene but also to understand it. One could
argue that there are no clear boundaries, or consensus among researchers, on where image
processing ends and computer vision starts. In the geometrical aspect of computer vision,
1The Bartlane picture transmission system transmitted around 500 pictures of important events
between London and New York before the outbreak of the second world war in Europe [81].
2
Image
Processing
Computer
Vision
Computer
Graphics
Acquisition
"Hous� an� Tree�"
Machine
Learning
Machine
Learning
Figure 1.1: Scene Understanding. Low level vision: an image is acquired using a sensor
and processed. Mid-level vision: using information enhanced by processing the image,
one can estimate the geometry of the scene. High level vision: by combining machine
learning, one can identify semantics of the scene.
it could also be interpreted as the opposite field of computer graphics, where the former
tries to the estimate the model given an image, and the latter is interested in rendering
the image given a certain model with as much realism as possible.
1.1 On the Computer Vision Road
From the problems in images and learning, we will start narrowing towards computer
vision problems. CV could be decomposed into many other fields which work together to
grasp an understanding of the whole scene. As aforementioned, CV can tackle a wide
range of problems varying from acquisition (the process in which an image is formed) to
more complex ones such as reconstruction, localization, and understanding.
Since we can not cover all topics relevant to CV here in this dissertation, we would
like to briefly talk about some of the problems that will be addressed later. The goal in
this section is build the intuition and importance behind each problem. Later, we will
provide a formal definition of the individual problems, revisit the state of the art and
3
explain how we are going to address such a problem by bringing graph centralities into
computer vision. Graph centralities in computer vision are the main novelty introduced
in this dissertation.
1.1.1 A glimpse on registration
We will spend a considerable amount of time dealing with the registration problem.
Registration is a widely explored topic in Computer Vision and an essential piece in
feature matching [73], stereo vision [107], tracking [60], medical imaging [52] and many
other applications. Intuitively, the problem consists of retrieving the transformation and
computing the correspondence between two or more point-sets on their own coordinate
systems. Approaches for registration can be classified into either rigid or non-rigid meth-
ods. Rigid registration only considers rigid transformations while non-rigid registration
allows for more involved tasks once it can account for scaling, skews, and complex
deformations (e.g. articulations, morphing). We could exemplify this problem by looking
at two objects in different images and tracing their correspondences.
a)
?
?
?
?
b)
Figure 1.2: Example of Image Registration. In Figure a) two fishes are related by a
similarity transformation. In Figure b) two fishes are related by a non rigid transformation.
Figure 1.2 shows two scenarios for image registration. As seen above, Figure 1.2a
shows two fishes related by a similarity transformation, which is composed of a rotation,
translation, and isotropic scaling, i.e. the object is scaled equally in all dimensions, note
that we will refer again to the different types of transformations in Chapter 3. Figure 1.2b,
in contrast, shows two fishes related by a non-rigid registration. In fact, they are not
the same type of fish. It is a trivial task for humans to match the different parts of two
distinctive fishes although. Nevertheless, it is still a challenging task for a computer to
find the right correspondence.
There are several surveys reviewing algorithms for both rigid and non-rigid registration
techniques. Papers differ in their focus from medical imagery [6] to range images [99], 3D
shapes, and point clouds [93, 106, 124, 117]. Despite the extensive number of approaches
4
available in the literature, two popular iterative techniques are worth mentioning: The
Iterative Closest Point (ICP) [12] and the Coherent Point Drift (CPD) [86]. They
have been investigated and extended by several authors [102] and we will cover both
algorithms in Chapter 3. We focus our attention on CPD, since in Chapter 5 we propose
an extension of CPD through the addition of graph centralities. However, this is not
our only contribution to centralities for registration. In addition, we also propose a
centrality-based algorithm for registration which is independent of CPD in Chapter 6.
1.1.2 Breaking an image into pieces
Image segmentation is a prevalent problem in CV whose goal is to decompose an image
into its parts. This problem has been broadly studied [15, 16]. It is somehow similar
to the clustering problem when points in a certain d-dimensional space are given and
we would like to separate them in clusters. When segmenting an image, one could
consider only the color information in a gray scale (256 colors for a 8-bits unsigned integer
channel), or 2563 colors for an Red, Green, and Blue (RGB) image. Nevertheless, in
the computer vision task, we have not only the intensity information (color), but also
spatial information associated with the points. Each point in our space could be seen as
a quintuple < r, g, b, x, y >. The way points are segmented (or clustered) vary according
to the algorithm used.
a) b)
Figure 1.3: Image Segmentation. Image a) shows a man with a hat and the segmentation
attempt on image b) decomposes the image into several parts. Reprinted from de Sousa
et al. [32], with kind permission of Springer Science and Business Media.
One popular approach for clustering is the K-Means [78] algorithm which given a
number of clusters to be found in the image, assigns a centroid for each cluster. Those cen-
troids move towards the most suitable location until the algorithm outputs the final label
5
for the points. A more powerful approach for image segmentation consists of minimizing
the Mumford-Shah functional [85] which models the problem as an optimization task. It
is a variational approach which establishes an optimality criteria for segmenting an image
into regions. Many approaches were proposed in order to minimize the Mumford-Shah
functional via convex graph-cuts relaxation and discrete optimizations [50, 92].
An image can also be segmented via graph-cuts technique [15, 16, 51], which associates
a node to each pixel of the image. In such a formulation, there is the definition of source
(s) and sink (t) nodes. The solution consists of finding the maximum flow between s and
t, which is equivalent to finding the minimum cut in the graph. The result obtained by
the min cut is a segmentation of the image. Alternatively, the segmentation problem
has also been possible via Normalized Cuts approach by Shi and Malik [113]. We also
addressed such a problem, but instead of the minimum or normalized cut, we modeled the
segmentation problem with the maximum cut [32]. A clear difference between min-cut
and max-cut is that min-cut can be solved in polynomial time, whereas the max-cut
belongs to the NP-Hard class, which means unless P = NP , Max-Cut cannot be solved
in polynomial time.
1.1.3 Separating the wheat from the chaff
One of the most important tasks in vision consists of separating objects into classes. You
could assess the importance of such a problem to trying to navigate within a scene in
complete darkness holding a device which could tell you where and which objects exist
in the scene. This could change the lives of blind people as it could assist them with
household tasks, e.g. to identify a corn can from tomato sauce. We could go further and
state that it is an essential task for self driving cars to be able to recognize a pedestrian
from a tree.
Applications for classification problems are countless, with each day more and more
devices will be produced incorporating some sort of trained algorithm to assist humans.
In essence, this goes beyond computer vision, as this is a machine learning problem
which is not aware that we are training an image, plain text, or numbers. Humans have
the innate skill of learning the representation of an object and we are able to classify a
new object into an already learned class. In order to train the computer to be able to
recognize an object, we need an algorithm which is able to split the data according to an
objective function. In order to obtain good classification performance, we also need to
feed considerable amount of training data.
We deal with classification related problems on a daily basis. Figure 1.4 shows a
classification problem that we could face in real life. We have a box of tea and many
bags of teas to separate, whenever we have to refill the box, we need to find the correct
class for the new tea bag. In this box, the teas were manually classified and organized
according to their type (feature). One could choose the tea bag color as a feature for
classification, although it would be a meaningful criteria, there are two bags with the
same color and yet different types. This example illustrates how hard it is to obtain
automatic classification.
6
Figure 1.4: Tea bags organized according to their classes.
There are many complex tasks involving the classification problem, for instance, the
differences in the learning architecture to be used. One could call a shallow architecture
when we create a feature vector representation for our object in a high-dimensional
feature space and try to split the feature spaces into the desired classes. The goal is
to design a representation which would hopefully allow the features to be separable
in a high dimensional feature space, i.e. features of the same class would end up in
the same location in the feature space. Algorithms can also be divided by the type of
separations they make, some algorithms require those classes to be linearly separable
(e.g. SVM), while others can adapt towards more complex separations (e.g. decision
forests), but at the risk of overfitting the training data. Regarding SVM, there are many
approaches which allow non-linear separations, such as the ones using kernel methods
[66]. In Chapter 4, we will perform a centrality-based learning. We will model the shape
of an object using the centrality information of a graph. This is a novel approach and to
the best of our knowledge, we were the first ones to address the problem in this fashion.
1.1.4 Playing the game with graphs
We have briefly spoken about graphs and will later expand on the concept (Chapter 2).
Graphs are a useful, yet powerful, data structure in computer science. Most of the
problems we mentioned can be modeled as a graph problem. In Chapter 2, we will
formally introduce graphs, but for now, it suffices to look at an example in Figure 1.5.
We show the image of a fish and a possible graph representation. We call a graph created
out of a point-set as the data-graph. In this example, the data-graph is the output of a
triangulation. We will discuss many ways that graphs could be represented. A popular
representation is the Delaunay triangulation [33], which is the dual of the Voronoi diagram.
7
In Chapter 6, we create our own type of data-graph in which the generated graph has
some important properties. In Chapter 7, we propose another type of graph based on
a particular type of Minimum Spanning Tree (MST) built upon a multi-objective cost
function. We compare our approach with Delaunay regarding stability of the data-graph
to small variations in the neighborhood of the points.
a) b)
Figure 1.5: Graph Representation of a Shape. Figure a) depicts a fish, b) Graph
triangulation created to represent the original shape.
Previously, we introduced the registration problem, but now we could cite some
approaches which model this problem using graph theory [122, 25, 77, 75, 18, 130, 131].
Graph-based approaches provide structural information in which edges could describe
relationships between entities (e.g. components of an object). On the one hand, when
there is no need to recover the spatial adjacency matrix or such information is not
available, the correspondence problem is reduced to the Graph Matching (GM) problem
[21]. On the other hand, there are approaches estimating the spatial transformation
without finding the correspondence between points [41].
The approaches for image segmentation using graph-cuts is one of the most successful
usages of graphs in computer vision. This motivates why it is important to learn and
study graphs. Probabilistic Graphical Models (PGM) aim at using graph theory together
with probability theory in order to tackle more complicated problems such as the Markov
Random Field (MRF) which we mentioned in the segmentation problem. MRF are based
on undirected graphs, but there are graphical models based on directed graphs such
as the Bayesian Networks. As discussed, we pointed out some useful techniques using
graphs, therefore, they are essential for computer vision.
1.2 About Relations and Importance
We have briefly discussed the problems that will be addressed later using graph centralities.
We will give an intuition about centralities and how they were conceived. Computer vision
8
is not the only field using graph theory, in fact, many fields in science take advantage
of graph representations. In particular, the field of Social Network Analysis (SNA) has
employed the concept of centrality for many years [43, 1, 8, 90, 30, 7]. The notion of
centrality tries to capture the measure of importance within a network (graph). Although
well-known for sociologists, such a concept has not been broadly explored outside SNA.
By using centralities, one can state that a certain node is more relevant than another or
that it is possible to rank the nodes according to their importance.
In this dissertation, we bring the idea of graph centralities into computer vision
problems, such as registration of point-sets, the centrality-driven data graph creation,
and even image classification. For instance, when we extend the CPD algorithm, we
propose to integrate the centrality measures during the computation of correspondence
of points. Thus, we consider not only the spatial information but also the importance of
nodes according to each centrality measure.
The flavor of centrality has already been applied in some problems in computer
vision before, especially on the approaches which emerged from the spectral graph theory
[122, 108, 111, 19, 69]. Nonetheless, to the best of our knowledge, our approach is the first
one to explicitly adopt centralities for the problems previously mentioned. The techniques
which use similar concepts often explore eigenvectors of some proximity matrix encoding
the location [108, 111] or the orientation of points. Some centralities (e.g. eigenvector)
work directly on the eigenvectors of the adjacency matrix. We aim at showing that graph
centralities can be used as good prior information in computer vision. Also, exploiting
topological information could bring us more benefits instead of solely relying on the
spatial information for points. Thus, one could extend those ideas to other vision tasks
as well.
1.3 Goals and Contributions
The ultimate goal of this dissertation is to perform an in-depth analysis of graph
centralities in Computer Vision. Starting from their definition, creation and historical
usage until practical applications in vision with experimental evaluation. To the best of
our knowledge, we were the first to introduce centralities in many tasks. Our contributions
can be listed as follows:
1. We introduce the centrality concept originated in the Social Sciences into many
computer vision problems and provide a literature review on their applications in
Chapter 2.
2. We propose to model a shape in a machine learning context by using the centrality
information of its graph representation (Chapter 4).
3. We introduce the centrality measure into the CPD algorithm, thus, integrating
topological information into the registration problem (Chapter 5).
4. We propose to register point-sets by generating a data-graph which is driven by
centralities (Chapter 6).
9
5. We propose a MST formulation also driven by the centrality distribution of the
generated tree (Chapter 7).
6. We show how one can integrate topological information by those centralities in a
diversity of problems (Chapters 2,4,5,6,7).
1.3.1 Publications
The results obtained in this dissertation were published in proceedings of conferences,
book chapters, and journals. We have obtained permission to reprint the material from
Springer and Elsevier and the licenses are available in Appendix A.
1. Samuel de Sousa, Walter G. Kropatsch. Data Graph Formulation as the Minimum-
Weight Maximum-Entropy Problem, Graph-Based Representations in Pattern Recog-
nition, Lecture Notes in Computer Science, Springer Verlag, Volume 9069, 2015,
pp. 13–22, May 2015. 2
2. Samuel de Sousa, Walter G. Kropatsch. The Minimum Spanning Tree of Maximum
Entropy, 39th annual workshop of Austrian Association for Pattern Recognition
(OAGM/AAPR), Salzburg, Austria, 20153.
3. Samuel de Sousa, Walter G. Kropatsch. Graph-based point drift: Graph centrality
on the registration of point-sets, Elsevier Pattern Recognition, Volume 48, Issue 2,
February 2015, Pages 368-379, ISSN 0031-32034.
4. Samuel de Sousa, Nicole M. Artner, Walter G. Kropatsch. On the Evaluation of
Graph Centrality for Shape Matching, Graph-Based Representations in Pattern
Recognition, Lecture Notes in Computer Science, Springer Verlag, Volume 7877,
2013, pp 204-2132.
5. Samuel de Sousa, Yll Haxhimusa, Walter G. Kropatsch. Estimation of Distribution
Algorithm for the Max-Cut Problem, Graph-Based Representations in Pattern
Recognition, Lecture Notes in Computer Science, Springer Verlag, Volume 7877,
2013, pp 244-2532.
1.4 Roadmap of this Dissertation
Chapter 2 brings the theory necessary to understand the concept of centrality. It provides
a brief introduction of graph theory and discrete geometry. It is important to define
concepts such as metrics and distances in graphs as soon as possible in order to understand
the remainder of this dissertation. We explain how one can model those concepts in
2Reprinted with kind permission of Springer Science and Business Media.
3I own the copyright of this work and it is publicly available at arXiv, no additional permission from
any publisher is necessary.
4Reprinted with permission of Elsevier.
10
graphs and how they are applied in our algorithms. We also overview earlier usages
of centrality ideas in computer vision. As our main contributions consist of centrality
measures, it is essential to understand these concepts.
Chapter 3 formally defines the registration problem and provides a literature review
on related algorithms focusing on those whose methodologies are closely related to ours.
ICP and CPD algorithms are detailed in this chapter. Moreover, in this chapter we define
the different types of transformation between points which are addressed later by the
different registration algorithms.
Chapter 4 uses of graph centralities in the classification of shapes. We create a graph-
based representation of a shape and describe this graph by using different centrality
measures. We briefly talk about some learning algorithms, general training vs. testing
scenarios and we analyze the performance of the different centrality measures based on a
Naïve Bayes classifier.
Chapter 5 explains our algorithm for point-set registration as an extension of the
CPD by integrating the degree, betweenness, closeness, and eigenvectorcentralities.
The centrality values bring topological information used during the computation of
correspondence between points. We analyze the performance on several datasets and
compared with the traditional CPD algorithm.
Chapter 6 formulates the registration problem in a very unusual way. We are interested
in finding a graph that would represent s point-set according to some properties. Such a
representation would allow us to match two objects (graphs) by exploiting topological
properties instead of solely relying on geometrical properties. Our desired graph is (i) as
unique as possible and (ii) and discriminative as possible regarding the degree distribution.
We pose a combinatorial optimization problem, the Minimum-Weight Maximum-Entropy
(MWME) to build such a graph by minimizing the total weight cost of the edges and at the
same time maximizing the entropy of the degree distribution. Our optimization approach
is based on Dynamic Programming (DP) and yields a polynomial time algorithm.
Chapter 7 deals with a similar problem in nature to the MWME. We pose the problem
of the Minimum Spanning Tree of Maximum Entropy (MSTME). The goal in this chapter
is not to perform a full registration, but to create a data-graph construction technique
which can be more stable than the Delaunay triangulation regarding variations in the
neighborhood of points. Although we are not tackling the registration problem in this
chapter, the type of data-graph we create is still useful for easing the registration process.
Considering that we are maximizing the entropy of the degree distribution, we will
still have a higher variability than the standard minimum spanning tree and we can
take advantage of such property to reduce the search space when matching points. We
performed experiments and compared with Delaunay’s in a number of different data-sets
in order to evaluate the stability of the data-graph created.
Chapter 8 presents our conclusions regarding graph centrality in computer vision.
We discussed the general results obtained in each problem, the limitations, the outcomes,
and we present our future research directions, as well as open questions.
11

CHAPTER 2
Graph Centrality and Discrete
Geometry
“Our experience of the world depends on the actual
structure of the networks in which we are residing and
on all the kinds of things that ripple and flow through
the network.”
– Nicholas Christakis
People are connected. There are several interconnected clusters being created and
modified as you read this dissertation. Networks are created by the natural human rela-
tions happening at every moment. We evolve those live networks when we communicate
at work, or when we share activities such as sports in which a common interest led us to
develop and grow our bonds with others. The connections we build and destroy define
this complex and live organism, often referred to the social network. As one can imagine,
there is not only one type of network, but you are most likely living within many different
types of networks. It is easy to associate a connection as a friendship, but the connections
do not necessarily imply such a fact. In fact, people might be connected without knowing
it, as when doctors analyze the growth of an epidemic to predict the contamination ratio
or when they need to find, in real time, new areas which are spreading the disease [47].
Many processes can be modeled as networks, from alcohol consumption behaviour [101]
to scene categorization in images [71]. The main contribution of this dissertation is to
model computer vision problems by using measures of centrality which were born in
social networks. In this chapter, we cover graph theory concepts which are necessary to
understand centralities. We also briefly overview some relevant concepts in geometry for
bridging the gap between computer vision and social networks.
A network is commonly represented using a graph, which is a formal way to describe
a world and its relations in discrete mathematics. Before diving into the centrality world,
13
it is convenient to define some ground knowledge on discrete geometry which will allow
us to pose problems associated with centralities later on.
The study of graphs can be dated to Euler [38] who posed the problem of the
Königsberg bridges. Figure 2.1 shows an schematic representation of the problem faced
by Euler: to determine whether or not it was be possible to cross each bridge exactly
once. Euler found out that the desired tour through the Königsberg bridges was not
possible. This is what we call an Euler tour or Eulerian cycle: a cycle using each edge
exactly once, or coincidentally for connected graphs, having all nodes with even degree.
It was the birth of the so-called graph theory and we will use this powerful abstraction
to model many of our problems.
x1
x2
x3
x4
x1
x2
x3
x4
b) Graph representationa) Königsberg
Figure 2.1: A representation of the seven bridges of Königsberg. The problem consisted of
finding a path crossing each bridge only once. Figure a) shows an schematic representation
of the bridges and figure b) shows a graph representation in which each bridge counts as
an edge between two nodes.
2.1 Graph
We need a representation to encode the problems we are going to address in this
dissertation. A graph (Definition 1) is a model which allows us to associate an object
with a node and to define a certain types of relationships by adding edges.
Definition 1 (Graph). A graph G(V, E) is a representation in which V = {x1,x2, . . . ,xN}
is the set of nodes and E = {e1, e2, . . . , eK} the set of edges.
An edge in a graph is a link between two nodes: ek = {xi,xj} , 1 ≤ i, j ≤ N, k ≤ K.
The nodes might represent entities such as cities, people, countries, web pages, etc. The
edges describe relations between those entities, and those relations might be different in
their nature, i.e. directed, undirected, weighted, etc. In the example of the Könisberg’s
bridges, there are several land regions surrounded by the Pregel river. The nodes
14
V = {x1,x2,x3,x4} are just an abstraction of those regions. Each bridge connects two
land regions and can be represented by an edge ek = {xi,xj} in a our graph.
A great advantage of such a model is the fact that many problems have already
been formulated using graph theory. Thus, we can solve a problem by addressing its
graph formulation instead. Examples of graph problems include the Travelling Salesman
Problem [5], Minimum-Cut and Maximum-Flow [14], and many others.
Definition 2 (Simple Graph). A simple graph G(V, E) is a graph with no parallel edges
or self loops.
There are many types of graphs, for instance, the graph used in the Euler problem
contained parallel edges between nodes (as there are two bridges connecting the same
island). Another type of graph is the one which does not allow multiple edges between
nodes (Definition 2). They can also differ regarding the type of edges. A graph with
unordered edges is called undirected graph, whereas when there is an order, the graph
becomes directed (Definition 3).
Definition 3 (Directed Graph). A directed graph is a graph whose edges are an ordered
pair of nodes implying in a direction of flow.
Figure 2.2 displays a directed graph with contains 5 nodes and 9 edges. A graph
also has the concept of degree, which is the number of neighbors connected to a node.
This concept will be covered later in details when we talk about graph centralities
(Section 2.3.1), but it is worth mentioning that in directed graphs, there is also the
difference between in-degree (the number of edges incident in a node) and out-degree
(the number of edges exiting a node).
a c
d
e
b
e1
e2
e3
e4
e7
e5
e6
e9
e8
Figure 2.2: A directed graph. In this graph, information might not flow in all directions,
one node can only reach another node in case there is a path of directed edges connecting
the former to the latter. All nodes can reach node d either directly (a, c, e) or indirectly
(b). Nevertheless, node d cannot reach any other node in the graph.
15
A graph containing directed edges does not allow the flow in both directions unless
there are two edges of opposite directions connecting two nodes, such as the circumstance
displayed in Fig. 2.2. as it happens between nodes a and b and between c and e.
Definition 4 (Weighted Graph). A weighted graph is a graph which associates a real
number (weight) with each edge of the graph.
A graph can also carry weights in the edges (Definition 4) and those weights might
represent many things, such as the distance between cities as in the Travel Salesman
Problem (TSP) or the capacity of pipes in a maximum flow problem. Weighted graphs
are really important in computer science. Our graph in Chapter 6 contains weighted
edges according to a distance function. The graph we modeled for segmentation tasks [32]
associates a weight for each edge using a distance function representing the color similarity
between regions. We are going to cover distances later in Section 2.2.2.
Graphs can also be created out of other graphs. We focus on two types of graphs,
one is induced by nodes (Definition 5) and the other is induced by edges (Definition 6).
Figure 2.3 shows two induced subgraphs of the graph in Figure 2.2. The first graph
(Figure 2.3a) is a vertex-induced subgraph, which contains some of the nodes of the
original graph. In the second example (Figure 2.3b), all nodes from the original graph
are now present, but only a subset of its edges are available.
Definition 5 (Vertex-induced subgraph). Given a vector S = {0, 1}N and a graph
G(V, E), a node-induced subgraph G 〈S〉 (V 〈S〉 , E 〈S〉) is a graph induced by S in which a
node vi ∈ S will also exist in G 〈S〉 if and only if Si = 1. In case vi is also present in VS,
so it will be all of its edges whose endpoints are also present in VS.
Definition 6 (Edge-induced subgraph). Given a vector U = {0, 1}M and a graph G(V, E),
an edge-induced subgraph G[S](V, E[U ]) is a graph whose node set is the same as the
original graph V, but only a subset of edges e ∈ E will be available in E [U ].
Induced subgraphs will be particularly important for us in Chapter 6 in which we pose
a combinatorial optimization problem in the pursuit of an edge-induced subgraph. This
graph which is obtained by the optimization procedure should have the with maximum
entropy in the degree distribution of the nodes and the minimum cost in the total weight
of the edges.
2.1.1 The Minimum Spanning Tree
An important type of edge-induced subgraph is the MST. Considering a weighted graph
such as the one in Figure 7.1a, one might be interested in finding a tree whose total
edges’ weight is minimum. The MST is an edge-induced subgraph since it contains all
the nodes of the original graph but only a subset of its edges. There is an important
information about such an induced graph, The MST is, in fact, a tree, meaning there are
exactly N − 1 edges, and those edges have the minimum total weight1.
1Assuming there are no negative weighted edges.
16
a c
d
e
b
e1
e7
e5
e9
e8
e1
e2
e3
e7
e8
a
b
e
d
a) Vertex-induced subgraph a) Edge-induced subgraph
Figure 2.3: Induced subgraphs. Graph a) displays a vertex-induced subgraph and graph
b) displays an edge-induced subgraph.
Definition 7 (Minimum Spanning Tree). A minimum spanning tree of an undirected
graph is a tree connecting all nodes whose total weight is minimum.
In Chapter 7, we propose a special type of MST, which has the minimum total weight,
but also the maximum entropy in the degree distribution of the nodes. The classical
algorithms to calculate the MST are the Prim [94] and Kruskal [65] algorithms. Both are
greedy algorithms that compute the MST in polynomial time. The complexity depends
on the actual data structure chosen for the graph (i.e. Adjacency Matrix, Heap, Fibonacci
Heap, etc.).
a c
d
e
b
e1
e4
e5
e6
a c
d
e
b
e1
e4
e5
e6
e4 e3
e2
a) An Undirected Graph         b) The Minimum Spanning Tree
Figure 2.4: The Minimum Spanning Tree. Graph a) has |V | = 5 nodes and its MST has
|V | − 1 = 4 edges, whose total weight is minimum.
17
2.1.2 Paths and Cycles
We faced a problem which uses paths and cycles back in the beginning of this chapter
when we first discussed about the bridges of Königsberg. We mentioned we needed to
find an Eulerian cycle, which was the problem of finding a path in the graph which would
pass through each bridge only once and return to the original starting point. We will
provide a more formal definition of both concepts.
Definition 8 (Path). A path in graph is an ordered sequence of edges connecting a
sequence of distinct vertices.
By simply having a sequence of distinct nodes, we call it a path. Nevertheless, if one
has the constrain that we should start and end in the same node, we call this a cycle.
Definition 9 (Cycle). A cycle in graph is a path in which the starting and ending nodes
are the same.
In the graph of Figure 7.1a, there is a path p1(a,d) = {e1, e4, e6} between points a
and d and another path p2(a,d) = {e4}. One could compare paths by evaluating their
lengths or the sum of their weights.
Definition 10 (Length of a Path). The length of a path is the sum of the weights of the
edges in the path: |p| =
∑
e∈p
w(e).
In an unweighted graph, the length of a path is equal to the number of edges considered
in the path, for instance, in the previous example of paths p1(a,d) = {e1, e4, e6} and
p2(a,d) = {e4}, the lengths are |p1| = 3 and |p2| = 1. One could state that p1 is a shorter
path than p2. When weights are now assigned to the edges, this might not be true since
the weights in p2 might be higher than in p1. We will discuss more about distances in
Section 2.2.
In the paper of Euler[38] about the bridges of Königsberg, he proves the so-called
Handshake Lemma (Definition 11). This lemma states that an undirected graph has
an even number of vertices with odd degree. Imagine you find yourself in a group of
people which are being introduced to each other, an even number of people must shake
the hands of an odd number of people. We will take advantage of the handshake lemma
later in Chapter 6 to prove one of our theorems (Theorem 4).
Definition 11 (Handshake Lemma). The handshake lemma establishes the connection
between the node degrees in a graph as
∑
x∈X
deg(x) = 2|E|.
2.1.3 Representation
A graph can be represented in a computer by using different data structures. We will
exemplify it with three possibilities: the (i) adjacency matrix, the incidence matrix
(ii), and the (iii) incidence list. All representations have their own advantages and
18
disadvantages. Figure 2.5 shows the graph of the graph Figure 2.2 using the different
representations.
An incidence list (Figure 2.5a) only stores the nodes which are connected to a certain
node and therefore, it is a very compact and good representation for sparse graphs. It
relies on a data structure containing the value of the node (e.g. an integer) and a pointer
to the next element, we shall refer to this structure as a node. The space complexity of
such representation is O(N + |E|) since we only allocate N + |E| nodes. There is a vector
indexing all nodes and therefore the initial space for the nodes is O(N), and whenever
there is an edge connecting two nodes, we allocate a new node and append it to the end
of the list. As there are |E| edges, the total complexity is O(N + |E|). For instance, the
edge e2 of the graph in Figure 2.2 starts from node a to node b.
In an incidence list representation, we would access node a with time O(1), create
a new node and append it to the end of the list, whose time complexity is bounded by
O(deg(a)). A node is added to the list of a node only if there is an edge connecting that
node to the former. Nevertheless, when there is a considerable number of edges, the
access to an element will be slow since all the information we have is accessible via the
head of the list, therefore, we will always iterate through the whole list until the last
element is accessed. Whenever a node has no more outgoing edges, the pointer of the
end of the list will contain the NULL value indicating the list has no more elements.
0
e
a 0 0 0 0
b 1 0 0
c 0 0 0
d 0 0
e 0 0 0
a b c d
1
1
1
0e
a
a
a
ad
b
c
adb
a
b
b
ad
ac c
e
ad
a) Incidence List                  b) Adjacency Matrix                          c) Incidence Matrix
1
0 0
1 1
0 0 0
1 1
0
a
b
c 0 0 0
d 0 0
e 0 0 0
00
0 1
1
0
e5e4e3e2e1
0
1
e6
0
1
e7
0
0
1
e58
0
0
1
e59
-1 -1
1
1 -1
1
0
0
-1
-1
0
0
0
-1
0
-1
0
-1
1
-1
10
0 0
0
Figure 2.5: Data structures for representing the graph of Figure 2.2. An incidence list
(a), an adjacency matrix (b), and an incidence matrix.
A graph can have a matrix representation, such as the adjacency matrix A (Fig-
ure 2.5b), in which the space complexity is bounded by O(N2). We represent the existence
of an edge by the element 1 in the matrix (or by the weight of that edge). For instance,
if the element A[b][a] = 1, it means that node b has an edge to node a. The graph
of Figure 2.2 is directed, but for undirected graphs, this matrix is symmetric. The
advantage of an adjacency matrix is the indexed access, which makes it fast (O(1)) to
find out if two nodes are connected. In this example, the matrix was filled with ones
and zeros. Nevertheless, in case there are weighted edges, this information would not
increase the space in memory. On the other hand, when the graph is sparse, i.e. |E| is
small (|E| << N), there would be space not being used in the memory.
A third type of graph representation is the incidence matrix M (Figure 2.5c). The
19
incidence matrix is a N × |E| matrix, in which there is one column for each edge. The
incidence matrix allows multiple edges between nodes, which could not described by the
adjacency matrix. In this representation, we are flagging the nodes which participate in
each edge. As there are always two nodes per edge (source and target), this is a very
computationally expensive data structure. For instance, we could define a non-zero real
number as the weight of the edge and its sign would represent if this is a source node or
a target node. In this, example, M [a][e1] = −1 indicating that there is an outgoing edge
from node a. The target of such an edge would be in the same column with the opposite
sign, i.e. M [d][e1] = 1.
2.1.4 Images as Graphs
We can also represent images using graphs in many different ways. One possibility is
to assume, for instance, that each pixel corresponds to a node in the graph. The edges
could be added according to a 4-connected neighborhood as in a Manhattan distance
(Section 2.2.2) or 8-connected neighborhood.
a) Image b) A node per pixel c) A node per region
Figure 2.6: Representing images as graphs. Figure a) shows a synthetic image. A graph
representation using a node per pixel (b) and a node per region (c).
Max-flow algorithms [14] in CV have become a powerful choice for energy minimization.
They often start by modelling each pixel of the image as node in the graph in order
to search for the minimum cut. When each pixel in the image is used as a node, the
number of nodes in the graph will be N = w × h, for w, h the width and height of the
image respectively. This can be, for some applications, a considerably high number.
Therefore, there are other possibilities to represent the image as graph, including the
super pixels representation [2], or watershed based algorithms which are solutions used
often to decrease the number of the nodes in the graph. The choice of contraction2 varies
according to the algorithm.
2Merging two or more nodes
20
2.2 Metric Spaces
Since we have spoken about many properties of graphs, it is now time to move towards
geometry. On a daily basis, we deal with the concept of distances and paths in a very
intuitive fashion, such as how far our house is from work, or how close two objects are on
an Euclidean space. The concept of distance is intuitive and we can safely use it without
the need to formally define it. In our study of centrality for vision, it becomes essential
to understand some basic concepts related to geometry and topology. Many centrality
measures establish the importance of a node in a graph based on the distance to another
node or to a group of nodes in the graph. The Euclidean distance (Section 2.2.2) is, for
instance, just one instantiation of the more general concept called distance.
Figure 2.7: The distance between two cities in Brazil. Paths taken can vary between 2
hours and five days.
Figure 2.7 shows, intuitively, how the concept of distance can vary. Although one can
measure the Euclidean distance between cities, it does not mean that one can take that
path. For instance, by traveling between the two Brazilian cities Belém and Manaus, we
can take approximately 2 hours in a non-stop flight. Traveling by car without stopping
would mean to travel at least for 2 days, and there is a part of the road that needs to be
crossed by ship. If, for instance, we decide to perform the whole trip by ferry, it could
take between 5 and 7 days. In real life, there are many different ways to measure distance
between two points, specially because we need to account for obstacles, traffic jam, and
so forth. The abstraction of distance is extremely important since most of problems in
computer vision encode some type of metric or distance function between points, colors,
or shapes. In this section, we will spend some time discussing about metrics and its
variations.
Definition 12 (Metric). Given a set X = {x1,x2, . . . ,xn}, a metric is a function
d : X×X→ R if it fulfills the metric axioms:
• Non-negativity: d(xi,xj) ≥ 0, for any two points xi, xj ∈ X. This axiom states
21
that there are no negative distances. For any two points, the distance will be at
least zero, e.g. when two points have are located in the same place.
• Coincidence: d(xi,xj) = 0 ⇐⇒ xi = xj . This axiom states the distance between
two points xi, xj ∈ X is zero if and only if they are the same.
• Symmetry: d(xi,xj) = d(xj ,xi). This axiom states that the distance is symmetric,
the distance from xi to xj is the same as the distance from xj to xi. Notice that
this might not be true in many cases (e.g. a directed graph), therefore later we
will specify the extensions of the metric concept which allows relaxation of those
axioms.
• Triangle Inequality: d(xi,xj) ≤ d(xi,xk) + d(xk,xj) for any three points xi, xk,
xj ∈ X. Consider the non-degenerate triangle of Figure 2.8a, as the sum of all
angles in triangle add up to π, we know that γ < π. Now we stretch γ in such a
way that all points are colinear (Figure 2.8b). We can see that, when the triangle is
degenerate, d(xi,xj) = d(xi,xk) + d(xk,xj). Therefore, for any γ < π, the triangle
inequality will hold.
xk
xi
xj
d(xi,xk)
d(xk,xj)
d(xi,xj)
γ xk
xi
xj
γ
d(xi,xj)
d(xi,xk)
d(xk,xj)
(a) (b)
Figure 2.8: Triangle Inequality. When the γ = π, (i.e. there is a degenerate triangle), then
the inequality turns into an equality: d(xi,xj) = d(xi,xk) + d(xk,xj). The inequality
will hold for any γ < 180.
2.2.1 Generalized Metric
Some of the aforementioned axioms might not be fulfilled, therefore, some metric variants
allow one or more axioms to be infringed.
Definition 13 (Quasimetric). A quasimetric is a function which fulfills all metric axioms
but the symmetry.
The quasimetric is an important metric relaxation since it is commonly seeing on the
daily life, for instance when traveling in one way streets or in a directed graph formulation.
In Figure 2.9, the distance d(xi,xj) = 3 while d(xj ,xi) =∞ since xi can not be reached.
22
xi xj
xk
3
√
5 2
xi xj
xk
(a) Quasimetric (b) Semimetric
Figure 2.9: Relaxation of the metric axioms.
Definition 14 (Semimetric). A semimetric is a function fulfilling all metric axioms but
the triangle inequality.
In a real life, it is not hard to understand that when we calculate the distance between
two places, either by estimating with time via Global Positioning System (GPS), or
calculating the kilometers until reaching a destination, we are not really calculating the
Euclidean distance (Section 2.2.2). We are not travelling to the place in the straight
line from our origin. We need to account for the feasible path in order to reach that
destination. Thus, as our world has obstacles, the triangle inequality might not always be
guaranteed. There is an active research area in mobile robotics regarding path planning
which takes that into account. Figure 2.9 shows an example of semimetric in which
d(xi,xj) > d(xi,xk) + d(xk,xj).
Definition 15 (Pseudometric). A pseudometric is a generalized metric that does not
fulfill the coincidence axiom.
Two points might be distinct even though they have the same location in a pseudo-
metric: d(xi,xk) = 0 and xi 6= xk.
2.2.2 Distances and Norms
Metric spaces and distances define the ground basis for understanding centralities. The
first type of metric we will discuss is the Euclidean, which is the natural concept of
distance. We will use this metric later in Chapter 6 when we define the weights of our
graph as the Euclidean distance which will be used for the DP optimization.
Definition 16 (Euclidean Distance). The Euclidean distance (de) between two points x
and y is defined as the length of the straight line segment between x and y in Rn(Eq. 2.1).
de(x,y) =
√√√√ n∑
i=1
(xi − yi)2. (2.1)
Eq. 2.1 is also known as the Pythagorean formula, for x and y ∈ Rn. The Euclidean
distance describes in a really concise way how far two points are by estimating a straight
line between the two points. It is a metric as it satisfies all metric axioms. Nevertheless,
23
it might not be always a feasible way to travel between two points, as when one is driving
a car, one needs to make turns and it is, therefore, not possible to travel through a
straight line. There is another type of distance called Manhattan which is a metric used
when one walks through blocks.
Definition 17 (Manhattan Distance). The Manhattan distance (dm) is distance computed
as the absolute differences based on a rectangular grid (Eq 2.2).
dm(x,y) =
n∑
i=1
|xi − yi|. (2.2)
This metric is also called 4-connected neighborhood in an image, since a pixel is connected
to (x± 1, y) or (x, y ± 1).
Definition 18 (Chevycheb distance). The Chevycheb distance (dc) is a distance computed
as the sum absolute of differences based on a rectangular grid (Eq 2.3).
dc(x,y) = max
i
(|xi − yi|). (2.3)
In an image, this metric is also called 8-connected neighborhood, a pixel is now
connected to (x± 1, y ± 1). Another important related concept is the norm, which is a
function assigning a length to a vector in a vector space. For p ≥ 1, a Lp-norm is defined
as:
‖x‖p = (xp1 + xp2 + · · ·+ xpn)
1
p (2.4)
Figure 2.10 shows the metric unit balls for the Euclidean, Manhattan, and Chevycheb
distances. The norm mapped for those distances are the L2, L1, and L∞ respectively.
(a) L2 Norm (b) L1 Norm (c) L∞ Norm
Figure 2.10: Metric Balls.
24
2.2.3 Distances in Graphs
In graphs, we are often concerned about the shortest paths between nodes. In fact, many
centrality values described in the next section rely on the shortest paths, and lengths of
those shortest path. Consider the graph on Figure 2.11. Each edge has the unit weight.
Definition 19 (Geodesic). A geodesic between two nodes x and y ∈ X is the path with
shortest distance.
a
b
c
d g
f
e
h
Figure 2.11: Geodesics in a Graph. The geodesic distance between nodes a and h is 3.
The geodesic between nodes a and h is a path whose distance is the shortest one. There
are many possible geodesics between those two points, for instance g1(a, h) = {a, c, f , h},
g2(a, h) = {a, b, e, h}, g3(a, h) = {a, d, g, h} are all geodesics between a and h.
Considering a path {xk, . . . ,xn}, where xk is the source and xn target nodes, the
geodesic could be formulated in Eq. 2.5:
g(xk,xn) = minimize
n−1∑
i=k
w(xi,xi+1) (2.5)
where w(xi,xi+1) is the weight of the edge between the two points xi, xi+1 in the shortest
path to be optimized.
Definition 20 (Geodesic Distance). The geodesic distance is the length of the geodesic.
The geodesic distance between points a and h is |{a  d  g  h}| = 3. There are
many possible geodesics in a graph, i.e. paths with the same length, but there is only
one geodesic distance.
Definition 21 (Eccentricity of a node). The eccentricity of a node is the greatest geodesic
distance between this node and any other node in the graph.
The shortest path between point a and any other node is 1 (e.g. g(a, b),g(a, c),g(a, d)).
The eccentricity of a node considers the shortest path between that node and all the
25
other nodes. The diameter of the graph is the maximum eccentricity of any node in the
graph.
ecc(xn) = maximize
yn
g(xn,yn), ∀ yn ∈ X \ {xn} . (2.6)
Definition 22 (Eccentricity of a graph). The eccentricity is the greatest geodesic distance
in a graph.
Finally, given the eccentricity of all points in the graph, we can define the eccentricity
of the graph as the maximum of all nodes.
ecc(G) = max {ecc(xi); ∀ xi ∈ X} . (2.7)
Mukherjee et al. [83] used the eccentricity of graph as a measure of importance for
modeling human action recognition. This concept could be considered as a centrality and
it was also used by Kropatsch et al. [64] in the analysis and matching of binary shapes.
2.3 Centrality
We will start discussing now about graph centrality, which is the main focus of this work.
The idea of centrality was born in the Social Sciences when Bavelas [9, 10] analyzed group
processes in human communication. It was later when Freeman [43] examined the studies
of different researchers that the concept of centrality was in fact clarified. Freeman [43]
illustrates some distinct conceptual properties using a star graph3 to explain the notion
behind centrality. For instance, the central node of such graph not only possesses the
highest degree among the others, but it also lies on the geodesic path between any two
nodes4. Given the fact that it has the minimum distance to any node, it is considered the
closest to them. Those ideas compose the core of the degree, betweenness, and closeness
centralities.
Definition 23 (Centrality). A centrality is a measure of importance of a node within a
network.
Concepts initially applied in social networks are being brought into other fields that
are not necessarily human-related such as virtual network [125], power-grid [87], image
saliency [91], resistor network [62, 17], and so forth. Nevertheless, we did not find many
approaches in computer vision taking advantage of centralities. In this section, we review
four centrality measurements: Degree, Betweenness, Closeness, and Eigenvector. Some of
those centralities have been applied in CV before, and we will be discussing those cases
accordingly.
3A star graph St is a tree of depth one where the central node is the root of the tree.
4The geodesic of any pair of nodes si, sk ∈ St is the shortest path between them. In a star graph, the
shortest path between two non adjacent nodes include the central node.
26
2.3.1 The Degree Centrality
The node degree is one of the most well known centrality measurements, due to its
simplicity but also due to its widespread usage in many graph theoretical problems. The
degree of a point d(xn) measures how many nodes are connected to xn, in other words,
how many edges are incident to this point. This concept is further broaden into in-degree
and out-degree for a directed graph related to the edges that arrive in xn and edges that
exit from it.
Definition 24 (Degree Centrality). The degree is a measure of centrality of a node which
associates importance with the number of nodes connected to it.
Let A be the adjacency matrix of a graph X . The degree xn can be computed as the
sum of elements in the row (or column) of A:
d(xn) =
N∑
k=1
An,k =
N∑
k=1
Ak,n. (2.8)
Although we can easily quantify the degree of a node, its importance to the rest of the
graph is not really clear as it depends on the average degree of the graph. For instance,
a degree 8 might be considered high in a graph whose average degree is 2, but it is low in
a graph whose average degree is 30. Hence, to overcome this, one could simply normalize
d(xn):
d̂(xn) = d(xn)
max{d(xk) : 1 ≤ k ≤ N} , (2.9)
where the maximum degree value of the graph would now be equal to 1. Alternatively,
Freeman [43] suggests that this normalization should occur with respect to the highest
difference between centrality values within the graph.
In CV, the node degree has been explored in several manners such as the abstraction
of rectangular lattice defining a 4-neighborhood until more specialized usages such as
in segmentation. However, the work of Pal et al. [91] treated the degree of a node as
centrality, i.e. measuring the importance of the nodes in a graph. In their work, they
aimed at extracting visual saliency of images. They were motivated by studies reporting
that objects whose locations differ significantly from their surrounding are considered
as salient by the means of drawing attention. Thus, they decided to design networks
that model such saliency (called Visual Saliency Networks) by encoding salient regions
as central nodes using the degree centrality.
2.3.2 The Betweenness Centrality
Imagine the situation in which you reside on a city a and you need to drive to a city c.
On the road, you must cross city b since a and c are not directly connected. Therefore,
one could say that city b has some sort of control over the path between a and c. For
instance, if b decides to take tolls on those roads or if it blocks the roads, it would limit
27
the traffic between cities. This situation illustrates the concept behind the betweenness
centrality, where the communication between two non-adjacent nodes depends totally
on the path between them [126]. The main idea of betweenness centrality defined by
Freeman [42] is that vertices that lie on the geodesic path of many other vertices will
possess great control over the information flow, due to the fact that they reside between
others.
The betweenness centrality b(xn) of a node xn is calculated as follows:
b(xn) =
∑
s 6=n6=t∈V
γn(xs,xt)
γ(xs,xt)
, (2.10)
where γ(xs,xt) is the number of geodesic paths between nodes xs and xt and γn(xs,xt) is
the number of those geodesics passing through xn. The computation of the betweenness
centrality requires the calculation of the geodesic paths to all vertices in the graph, this
could be achieved by the Floyd-Warshall algorithm. It is important to mention that
there is another measure of betweenness centrality based on random walks defined by
Newman [88].
Definition 25 (Betweenness Centrality). Betweenness is a measure of centrality which
associates importance based on how often a node is found on the geodesic path of other
nodes in the graph.
Mantrach et al. [79] proposed a covariance measure called Sum-over-Path (SoP) which
measures the similarity between nodes in a graph. Two nodes are considered highly
correlated if they co-occur frequently on the same (shortest) paths between nodes in the
graph. Thus, the SoP estimates the expected number of times a node occurs on a path
and they compare their betweenness measure with both Freeman’s and Newman’s. A
SoP formulation for string edit distance has been proposed by Garcia-Diez et al. [45] and
the Sum-over-Forest (SoF) index was later proposed by Senelle et al. [110] with the same
inspiration that large forests would occur with lower probability and short forests would
occur with high probability in a graph.
In CV, Li et al. [71] used betweenness centrality for scene categorization. They build
a social network of images of one class (e.g. kitchen) and calculate the betweenness of a
certain image of this class with respect to the network. In this way, this is not an image-
to-image measure, but an image-to-class measure estimating the overall connectivity to
the class to which this image belongs. Their results were superior to the methods in the
state of the art on three datasets.
Mukherjee et al. [83] present an application of centrality in human action recognition.
They employ centrality to create a compact codebook out of a large vocabulary of poses
(bag-of-word approach). Cukierski and Foran [26] use centrality to solve an open problem
in the ISOMAP algorithm [118], which is a non-linear dimensionality reduction method.
The ISOMAP algorithm computes geodesic distances between data points with the help
of a neighborhood graph. Unfortunately, these graphs sometimes contain unwanted
edges, which connect disparate regions of one or more manifolds and this leads to a
28
distortion of the calculated geodesic distance matrix. This problem is where [26] propose
an edge-removal method based on graph betweenness centrality, which can robustly
identify manifold-shorting edges. Correa and Ma [22] use derivatives of centrality in the
visualization of social networks. The derivative of centrality informs how much a given
node influences the importance of another node, even if they are not directly connected.
They found out that derivatives of centrality are tool for analyzing social networks, they
help to simplify the layout of complex networks and to visually measure the centralization
degree of a network. Furthermore, they provide information to estimate other metrics
like structural balance and uncertainty.
2.3.3 The Closeness Centrality
The closeness centrality is a measure that evaluates how close a certain node is to all the
other nodes in the graph. It was initially proposed in [10, 55, 11, 103]. As defined by
Sabidussi [103], the closeness centrality c(xn) of a node xn can be computed as follows:
c(xn) =
∑
xk∈X\{xn}
1
d(xn,xk)
, (2.11)
where d(xn,xk) is the shortest distance between vertices xn and xk. Roughly speaking,
when computing the distance between two nodes, we are able to define how far those
nodes are. Therefore, by inverting it, we intuitively obtain the measure of how close they
are.
Definition 26 (The Closeness Centrality). Closeness is a measure of centrality which
associates importance based on how close a certain node is to all the other nodes in the
network.
When analyzing shapes, de Sousa et al. [31] described a shape by the histogram of the
centrality values using an 8-connected neighborhood graph. A desired behaviour would
expect that similar shapes should have similar histograms with respect to their centrality
values. They have evaluated the robustness of each individual centrality by randomly
removing nodes of the graph and calculating the new histogram after this topological
change in the graph. The results indicated that the closeness centrality was the most
robust centrality against that type of noise for the shape matching task.
2.3.4 The Eigenvector Centrality
The eigenvector centrality of graph X corresponds to the eigenvector w = (w1, w2, . . . , wN )T
associated with the highest eigenvalue λ of the adjacency matrix A [90]. The study of
eigenvalues and eigenvectors of the adjacency matrix and Laplacian matrix of a graph
has received a vast contribution from many authors [20, 82, 35] in mathematics and
spectral graph theory. In CV, it has important results in image segmentation such as in
the work of graph-cuts by Greig et al. [51] and other segmentation methods exploring
the Laplacian matrix [95].
29
Definition 27 (Eigenvector Centrality). The eigenvector centrality is a measure of
importance in a graph based on the first eigenvector of the graph adjacency matrix.
The idea of eigenvectors of some matrix describing the relationship between points
was already exploited in point-set registration. In fact, one of the earlier works employing
eigenvector analysis (although not in a graph) was conducted by Scott and Higgins
[108] and later refined by Shapiro and Brady [111]. Both approaches apply a Singular
Value Decomposition (SVD) of a matrix Hi,j = exp(−r2
ij/2σ2) which uses the squared
distance (r2
ij) between nodes i and j to define the proximity of nodes. Shapiro and Brady
[111] handled some limitations of the original work such as handling large rotations and
translations in the image plane. The eigenvector analysis employed by them occurred over
the matrix H (proximity) while the so called eigenvector centrality occurs directly on the
adjacency matrix of the graph. However, for point-set registration, it is not uncommon
to define a proximity matrix between points. The recent work of Zhou and De la Torre
[131, 130] (described in Section 3.6) factorizes a proximity matrix for the graph matching
problem.
Leordeanu and Hebert [69] propose a solution for the registration problem by exploring
the principal eigenvector of an association matrix M. The interpretation of the eigenvector
by the authors relates to the confidence of a certain assignment. For each correspondence
element a = (i, i′) ∈M, the eigenvector x∗(a) stands to the confidence of a.
2.3.5 Example
We shall demonstrate all those measures using a more concrete application. Observe the
graph of Figure 2.12. We have color coded the nodes according to the degree, node a
is the lowest degree node (1) and b is the highest one (4). Although it is intuitive to
visualize the degree, the other centralities are not so trivial to reason. Table 2.1 shows
the four centrality measures we have discussed and their values for each node.
a b
c
d
e
f
Figure 2.12: A simple graph color code according to the degree distritbution
When looking at Table 2.1, we notice some interesting facts about the importance of
the nodes in our graph. The degree is a discrete measure, we have a very clear distinction
of the nodes according to their degree, but when looking at the closeness centrality, we
see that nodes a and e have similar values, they are the nodes which are most distance
30
from all the other nodes. When looking at the distribution, it is a more smooth measure
than the degree.
Centrality a b c d e f
Degree 1 4 3 3 3 2
Closeness 0.50 0.83 0.71 0.71 0.71 0.56
Betweenness 0.00 4.33 1.50 0.33 1.50 0.33
Eigenvector 0.34 1.00 0.85 0.91 0.85 0.57
Table 2.1: Centrality values obtained for graph of Figure 2.12
The betweenness centrality decided to assign zero to node a since it has no control
over the information flow of other vertices. Node b becomes the most important node in
the graph, since any information who wishes to reach a needs to go through b. Therefore,
the importance of b according to betweeness is much higher than all the other nodes.
Eigenvector centrality also considered b as the most important node and a as the least
important one.
2.4 Conclusions
There are still many other meaningful properties of graph which we did not cover in this
chapter, such as the Laplacian matrix which is commonly defined as L = D−A, where
D is a diagonal matrix with the degree of the nodes and A is the adjacency matrix. This
Laplacian matrix has important results in spectral graph theory and it is closely related
to the commute time [104]. In fact, it can be computed using the pseudoinverse of the
Laplacian matrix of the graph. The commute time is a distance measure between nodes
in a graph using the idea of average number of steps a random walker would take to
reach a node j starting from a node i and to return back to the origin. It has a relation
to the resistance distance proposed by Klein and Randić [62] in their study of resistive
electrical network. This concept has been explored in computer vision by Qiu and
Hancock [95, 96] on the problems of image segmentation and clustering. Recently, Cortés
et al. [24] proposed the usage of centralities on the problem of Graph-Edit Distance.
There has been some work on Fuzzy Social Network Analysis (FSNA) [58] and many
of those centralities (degree, betweenness, and closeness) were also reformulated as Fuzzy
centralities [57, 74]. We could also evaluate how those centrality values behave in CV.
There is still many vision applications which could take advantage of centralities.
31

CHAPTER 3
Correspondence and Registration
“I have six locks on my door all in a row. When I go
out, I lock every other one. I figure no matter how
long somebody stands there picking the locks, they
are always locking three.”
– Elayne Boosler
Imagine you are in a dark room with several keys in your hand and there is a locked
door which leads to the exit of that place. You are desperate to go out and you need
to find the correct key which opens the door. When you try the first key, it does not
enter the lock hole. The second key gives you hope since it partially enters the lock, but
the door knob does not turn. The problem you are facing is called matching, in which
given |K| keys and a lock l, you are interested in finding a key k ∈ K, such that the
distance between the silhouette of the key and the lock d(k, l) = 0. We could also call
this problem registration, since you are not only interested in finding the right key, but
you are aligning both objects by applying a rigid transformation to the key until you
can perfectly align each ridge. A partial match might also occur due to similarities, as
occurred with the second key. This problem could also be thought as belonging to the
Special Orthogonal Group SO(3), as the key is an object of the R3 and it cannot be bent
without breaking since it is rigid. Therefore, given the correct correspondence (the right
key), we only need to find a rotation and translation to register the key and the lock.
Definition 28 (Matching). Given two point-sets X and Y, a point matching consists of
finding a bijection x↔ y, for all x ∈ X,y ∈ Y.
Definition 29 (Registration). Registration is the problem of finding a spatial transfor-
mation which best aligns two point-sets.
In this chapter, we will overview the problem of correspondence (as in a graph theory
framework) and the registration problem when assuming that the points belong to a
33
certain space (e.g. Euclidean space). We will decompose it into several parts given the
nature of the transformation (rigid, similarity, affine, etc.). Finally, we will review the
main algorithms used in the literature to deal with registration problems. Chapter 5
aims at integrating centralities into the CPD algorithm reviewed later in this chapter.
Chapter 6 proposes a centrality-oriented approach focusing solely on the correspondence
problem between point sets.
3.1 Transformations
Given two point-sets X and Y , our task is to retrieve the transformation that best maps
Y onto X and to estimate the correspondence between points in both sets.
There are several types of transformations that will appear throughout this dissertation
and each transformation has special properties which are relevant for us. Some of them
are linear transformations (rigid, similarity, affine) and some are non-linear (tapering,
bending, twisting). We will focus on linear transformations through this dissertation.
3.1.1 Rigid Transformation
The simplest type of transformation between two object is the rigid one:
Definition 30 (Rigid Transformation). A rigid transformation between two points x and
y is a transformation which can be described by a rotation matrix R and a translation
vector t, such that y = Rx + t.
All rigid transformations can be described a rotation R ∈ SO(3) 1 and a translation
t, in which the three angles α, β, γ consist of the rotation around the axes. A rotation
matrix is defined as R = RαRβRγ :
R =
1 0 0
0 cos α −sin α
0 sin α cos α

 cos β 0 sin β
0 1 0
−sin β 0 cos β

cos γ −sin γ 0
sin γ cos γ 0
0 0 1
 (3.1)
A point x transformed by a rigid transformation might not only have been rotated
by R, but it also could have been translated by t> = (tx, ty, tz) in such a way that the
new transformed point y = Rx + t. We could describe the transformation of a point x
by both rotation and translation in a single matrix using homogeneous coordinates:[
y
1
]
=
[
R t
0> 1
] [
x
1
]
(3.2)
When using homogeneous coordinates, there is the advantage of performing the
rotation and translation as a single matrix multiplication. This matrix has six degrees of
freedom, three for the rotation and three for translation. Figure 3.1 shows examples of
rigid transformation.
1SO(3) stands for the Special Orthogonal Group.
34
a) Translation b) Translation + Rotation
Figure 3.1: Rigid transformation. Figure a) is only transformed by a translation vector t,
whereas Figure b) also has a 90 degree rotation around z axis.
3.1.2 Similarity Transformation
A similarity transformation adds one degree of freedom more to the rigid transformation
which is the isotropic scaling (s). We could say that an object Y scaled by s in all
dimensions, rotated by R and translated by t is be related to another object X by a
similarity transformation and its matrix representation is:[
y
1
]
=
[
sR t
0> 1
] [
x
1
]
(3.3)
Figure 3.2 shows four ducks related by similarity transformation. All of them have a
different scale and a rotation.
Figure 3.2: Similarity Transformation. The ducks are related by rotation, translation,
and isotropic scaling.
Definition 31 (Similarity Transformation). A similarity transformation between two
points x and y is a transformation which can be described by a rotation matrix R, a
translation vector t, and an isotropic scaling factor s, such that y = sRx + t.
Chapter 6 introduces a point-set correspondence algorithm which can cope up to a
similarity transformation.
35
3.1.3 Affine Transformation
The previous rigid and similarity transformations could be considered as simplified
affine transformations, since affine transformation describes the most general linear
transformations between two objects. Linear transformations are the ones we can
describe by a matrix multiplication. The affine matrix is described as follows:[
y
1
]
=
[
A t
0> 1
] [
x
1
]
(3.4)
Definition 32 (Affine Transformation). A similarity transformation between two points
x and y is a transformation which can be described by an affine matrix A.
We can exemplify the more general affine transformation with a shearing transformation
not covered in rigid and similarity, but it is affine. When we shear along one direction
(e.g. z), we fix that dimension and change the others by, for instance, the following
matrix:
[
y
1
]
=

1 0 a 0
0 1 b 0
0 0 1 0
0 0 0 1

[
x
1
]
(3.5)
a) Anisotropic scaling b) Shearing
Figure 3.3: Affine transformation. Figure a) is only transformed by an anisotropic scaling
and Figure b) by shearing along the z axis.
Anisotropic scaling occurs when we scale the objects with different scaling factors in
each dimension, which differs from the similarity transformation. Figure 3.3a shows an
anisotropic scaling (the object has been scaled in the dimension x). Figure 3.3b shows
an example of shearing.
3.2 Iterative Closest Point Algorithm (ICP)
The Iterative Closest Point (ICP) algorithm is one of the most famous algorithms on
registration of point-sets, it was proposed by Besl and McKay [12]. It is an iterative
36
algorithm whose idea is to find, for each point in one set, the closest point in the
corresponding set. Once the closest points are computed, the algorithm performs a
least-square minimization in order to estimate the rigid transformation between both
point sets. After the transformation is estimated, the points are transformed and the
algorithm iterates again until the obtained error is below a certain threshold.
The first step is to compute for each reference point xi ∈ X the closest data point yi:
yi(xi) = arg min ||xi − y||2,∀y ∈ Y (3.6)
Once the correspondence is obtained (i.e. the closest points), the algorithm estimates
the registration assuming a rigid transformation. At every iteration, the algorithm
minimizes the distance between the transformed point cloud (R ∗Y+ t) and the reference
point could X :
minimize
N∑
i=1
d(R ∗ yi + t,X ) (3.7)
where d(., .) is a metric function (Section 2.2) which determines the error between the
point clouds. Pottmann et al. [93] provides a study on the convergence of the ICP
algorithm. Ezra et al. [39] also analyze the performance of ICP.
ICP is known to suffer from local minima issues which makes it highly dependant on
the initialization. Several ICP-variants have been proposed to address that problem. One
approach worth mentioning is the go-ICP [128] which performs the registration using a
Branch and Bound optimization.
3.3 Coherent Point Drift (CPD)
The CPD algorithm proposed by Myronenko and Song [86] also addresses the point-set
registration problem with an iterative approach that aligns one point-set towards the
other until convergence. However, it is a more probabilistic approach since they address
the registration as a density estimation problem. They consider the points in one set (e.g.
Y) as Gaussian Mixture Models (GMM) centroids and X as the data points generated by
the GMM, in which the correspondence and transformation are estimated via Expectation
Maximization (EM). For instance, points in Y are considered as GMM centroids with
the following probability density function:
p(x) =
M+1∑
m=1
P (m)p(x|m) (3.8)
where P (m) = 1/M is a uniform probability uniform for the GMM centroids and the
authors introduced p(x|M + 1) = 1/N to account for outliers. Expanding Eq. 3.8 with a
linear combination using parameter 0 ≤ w ≤ 1, they end up in the form:
p(x) = w
1
N
+ (1− w)
M∑
m=1
P (m)p(x|m). (3.9)
37
N is the number of points in X and M the number of points in Y (i.e. the number
of centroids). CPD considers the spatial coordinates associated with the points during
the computation of probabilities p(x|m) and P (m|xn). Pc(m|xn) stand for the posterior
which calculates the probability of a point xn being generated by the centroid m. The
subscript “c” stands for CPD since later we will introduce our probability estimation
with the subscript “g” (Chapter 5). We will consider the centrality information of the
graph along with the spatial clues. Finally, the original CPD probability function is
defined in Eq. 3.10:
Pc(m|xn) =
exp
(
−1
2
∥∥∥∥xn − T (ym, θ)
σ
∥∥∥∥2)
M∑
k=1
exp
(
−1
2
∥∥∥∥xn − T (yk, θ)
σ
∥∥∥∥2)
+ c
(3.10)
where c = (2πσ2)D/2 w
1−w
M
N and θ and σ are the parameters estimated by EM, T (ym, θ)
stands for the transformation applied to ym given the set of parameters θ, D is the
dimension of the point-sets. This transformation can be rigid, affine, or non-rigid as
discussed in the previous chapter. For instance, let us take a look at the similarity trans-
formation where there is a rotation, translation, and a scale factor. The transformation
T (ym, θ) = sRym + t. The point ym is being rotated by R, scaled by s and translated
by t, which is known as the absolute orientation problem. If the difference between points
xn and the transformed point yn is zero, it means they are perfectly aligned.
CPD reparametrizes the GMM centroids using parameters θ and σ2 which are
estimated by minimizing the negative log-likelihood. This is equivalent to maximizing
the likelihood function:
E(θ, σ2) = −
M∑
n=1
log
M+1∑
m=1
P (m)p(x|m) (3.11)
and they solve the transformation T for the rigid, affine, and non-rigid cases separately.
For details on the final estimation of those parameters, refer to [86].
3.4 Expectation Maximization Methods
Many researchers decouple the registration problem into smaller problems (transformation
and correspondence). It is often referred to an alternate optimization scheme, i.e. the
transformation between point-sets is fixed while the correspondence between points
is estimated. Later, the correspondence is fixed and the transformation is obtained
[86, 25, 77, 75, 19, 105, 80]. Combining solutions for the small problems will lead to the
solution to the bigger problem.
Cross and Hancock [25] developed a framework based on EM algorithm as a dual step
optimization. Previous works [123, 46] estimated the affine and Euclidean parameters
of point-sets and they could be considered as one of the first to introduce structural
38
constraints into correspondence. Authors employed the Delaunay [33] triangulation to
build the graph and to constrain the correspondences matches of the point-sets based
on the edges of the graph. They define a structure called the superclique as the sets of
nodes connecting to a central node. This structure is later used in the computation of the
probability associated with the matching. It is required to have a dictionary of possible
mappings and this is clearly the bottleneck of their approach. The matching algorithm
estimates the geometric transformations and correspondences by jointly maximizing the
likelihood over the space of matches and transformations using the EM algorithm.
3.5 Spectrum-based Methods
One of the first usages of spectral methods in GM was conducted by Umeyama [122]
who performed an eigendecomposition of the graph adjacency matrix to obtain the
permutation matrix P representing the correspondence between graphs. The solution
created by Scott and Higgins [108] and later refined by Shapiro and Brady [111] was based
on a proximity matrix between points (Section 2.3.4). Carcassoni and Hancock [19] also
defined a point-proximity matrix H which can be constructed in different ways, it could
be based on a Gaussian weighting function similar to [111, 108], sigmoid, or Euclidean
function. They use the eigenvectors of this matrix H to compute the probability of
assignment between points along with the EM algorithm. In their experiments, they
outperformed Shapiro and Brady [111]’s approach.
Jain and Zhang [59] dealt with the shape correspondence problem by embedding the
shapes in the spectral domain. They encode the spatial coordinates of each point-set
using a Gaussian kernel with the geodesic distances to create two affinity matrices. Once
they find the spectral embedding of those matrices, they perform an iterative alignment
and obtain the correspondence by finding the best matching according to the `-2 distance.
In their experiments, they outperformed the previous approaches [122, 111, 19].
Other approaches using spectral theory include the work of Mateus et al. [80] that
tackled the shape matching problem by exploiting the Laplacian matrix (Section 2.3.4).
3.6 Conclusions
Many algorithms have been proposed to address the point-set registration problem. More
Recently, Zhou and De la Torre [131, 130] exploited the problem of point correspondence
via graph matching. They first proposed the Factorized Graph Matching (FGM) [130]
and later the Deformable Graph Matching (DGM) [131]. The latter creates an affinity
matrix K encoding the similarity of both nodes and edges as well as the pairwise geometry
of the points. They factorize the matrix K into matrices that preserve local structure
of each graph. This factorization decouples the structures of the graph by replacing K
with six smaller matrices. They alternate the optimization of the correspondence (using
the path-following algorithm) and the geometric transformation (with an optimization
scheme similar to ICP).
39
Torresani et al. [119] also propose a dual decomposition scheme for the problem of
correspondence. Their scenario considers the correspondence of sparse features extracted
from images. The proposed energy function is based on several terms: feature appearance,
compatibility of correspondence, and spatial coherence.
In this chapter we provided a short summary of the most popular algorithms such
as ICP and CPD and briefly described more approaches based on EM, which is the
optimization framework behind CPD and many other algorithms. Later we will extend
CPD by incorporating centralities and propose our own centrality-based registration.
40
CHAPTER 4
On Centrality-based Learning
“Siempre hago lo que no puedo hacer, para aprender
cómo hacerlo. - I am always doing that which I cannot
do, in order that I may learn how to do it.”
– Pablo Picasso
We started the first chapter by motivating that computers could do more than just
arithmetics, that they could “learn” things. The concept of learning is not necessarily the
same as our understanding of learning for human beings. In the machine learning field,
one could say that a computer is learning if it is able to maximize or minimize an objective
function. The algorithm has an input data and based on the cost function, it tries to
reach the minimum (or equivalently the maximum) of that function. Such a solution
could be used to separate objects between classes, or to estimate regression coefficients,
learn parameters, and many other applications. We will be providing different objective
functions in this dissertation. We provide different algorithms that make decisions based
on the cost of a current solution. For instance, in a classification task, the algorithm
could decide to which class an object belongs, or which word is being pronounced in a
speech recognition task.
We perform a review on related work and to the best of our knowledge, we were
the first ones to employ centrality measures for 2D shape matching. The results of this
chapter were published in the proceedings of the Graph-based Representation in Pattern
Recognition in Vienna, Austria, 2013 [31]. We have obtained permission from Springer
and Business Media to reuse the contents of the paper in this dissertation, in both written
and online forms. The license is attached in Appendix A.
4.1 Shape Matching
We focus on the problem of shape matching and we address it by using machine learning
techniques. The shape matching problem could be defined as follows:
41
Definition 33 (Shape Matching). A shape sk matches another shape sj if i(sk) = i(sj),
for i(.) being an arbitrary description function that receives a shape and outputs a
descriptor.
We can illustrate the concept of learning with the following toy example. Imagine
you have a camera which is able to find circles accurately. It is not important at the
moment how it does it, but you can obtain the circle in each image you acquire. Based
on this information, one poses the question if it is possible to distinguish between the eye
of a cat from a human eye given only the radius of the circle. Hence, you build your first
feature for your human-vs-cat classification problem. When you acquire a considerable
number of images of both cats and humans, you could plot the data according to the
radius vs the classes and see if those two classes would overlap or not. Let us assume
for now that they do not overlap. Since we have a one-dimensional simple feature (the
radius) which lie in distinctive regions in our feature space, we could find a linear function
that separates both classes. This linear function only has one parameter τ to be found.
Therefore, our cost function would try to estimate τ to separate both classes.
In the real world, a simple feature such as the radius would not be enough to separate
the classes. For instance, the closer the cat is to the camera, the bigger the radius of the
circle will be. You could make your feature more complex, by taking, for instance, color
or gradient information. The design of a good feature could be considered a research
problem by itself. In Section 4.3, we model a shape using centrality information, which
is another application of centrality in CV.
4.2 Graph-based Shape Matching
Let us narrow and focus on the problem of matching shapes using graph representations.
Notice that we are not trying to match points of a shape as we performed in the previous
chapter, but are matching shapes of objects according to their classes. Let us look at
some shapes on Figure 4.1 which are used in this chapter. We chose Kimia’s Shape 99
database [112] which contains 11 images of 9 classes. It consists of simple binary shapes,
which can easily be turned into graphs. We chose this dataset because it captures some
properties which help us analyze the performance of the centralities. For instance, it
contains elongated objects (wrench and shark), circular shapes (rabbit), shapes with high
curvatures (hand and airplane), and so on. Instead of just randomly choosing databases,
we would rather focus on properties of shapes which can help us obtain an idea about
the behaviour of the algorithm according to topological properties.
In the pattern recognition literature, there are many approaches that were used
throughout the years to match shapes. Shock graphs [115] were frequently employed for
such a task and they are closely related to the medial axis transform [13]. A node in a
shock graph is labeled according to the shock type and the edges depend on the shock
formation time, therefore, having a more descriptive power than the medial axis. Siddiqi
et al. [115] address the shape matching problem based on acyclic shock graphs. The task
42
Figure 4.1: Images from the Kimia’s 99 Database. This database is composed of 11
images of 9 different classes: wrench, dude, stingray, hand, scribble, fish, bunny, airplane,
quadrupedes.
43
of matching shock graphs can be reduced to matching shock trees, using a shock graph
grammar, in polynomial time.
Sebastian et al. [109] performed 2D shape outline matching based on the edit distance
between their corresponding shock graphs. Their framework partitions the shape space
based on the shock graph topology and discretizes the space of deformations with the
help of their shock graph transitions. Finally, they find the globally optimal sequence of
transitions by employing a graph edit distance algorithm. Torsello and Hancock [120]
present a geometric measure to compute the similarity between skeletons of shapes.
Felzenszwalb and Schwartz [40] propose a representation called shape-tree by describ-
ing the boundary of a shape using multiple levels of resolution. In this way, it is possible
to determine the similarity between two shapes. Zhu et al. [132] describe an object using
a hierarchical graph by defining a deformable template which considers parent-child
relationships. In the top vertex the pose (position, orientation, and scale) of the center of
the object is stored and in the child vertices the poses of points on the object boundary
are described.
4.3 Centrality-based Feature
In this section, we deal with the problem of describing shapes in a machine learning
fashion. A good description is essential for a successful training/testing task.
Definition 34 (Feature). A feature is a description or representation of an observable
phenomenon.
Let us reason about the actual feature representation for an image class. Consider
we have many images whose dimensions consists of 32 pixels in height and 32 pixels in
width with only a single channel. Those image contain the handwritten digits from “0”
to “9”. Our task consists of recognizing the class depicted in the image. We try to find a
representation that would allow us to encode that information, i.e. images of that same
class would ideally have similar representation. A simple approach is to create a vector
of size 322 using all pixel values. In this way, we can represent each image as a single
point in this 210-dimensional space. We would build feature vectors for all images in
the same manner and expect that those points would end up close to each other in the
feature space.
It is a much deeper issue to decide how the actual training process will be performed.
There are unsupervised approaches which try to cluster points which belong to the same
without any prior knowledge. Some perform dimensionality reduction techniques to bring
the problem to a lower dimensional space. Other approaches require the label (e.g. the
class) for each image and use supervised algorithms to split that space based on the
labels of each point. The feature representation has a big impact on the results, since a
poorly designed feature might generate the same representation for images of different
classes. Thus, the learning algorithm will not be able to split the space accordingly.
44
fish1 betweenness closeness degree eigenvector
nz2 betweenness closeness degree eigenvector
circle betweenness closeness degree eigenvector
Figure 4.2: Centrality distribution of graphs (fish1, nz2, circle1) using an 8-connected
neighborhood and their noisy versions.
45
4.3.1 Centrality of Shapes
Figure 4.2 displays centrality values of some images of Kimia’s database which we are
going to use. The values are color coded using a rainbow palette. For each image,
we display the centralities introduced in Chapter 2. The idea is to visualize how the
centrality varies according to different shapes to evaluate if the shape has an impact on
the centrality values. Our graph is built using an 8-connected neighborhood (as discussed
in Section 2.2.2). We encode topological information of a graph (centrality) and not
image properties such as gradients, borders, or moments. The question to be asked is if
such properties could be useful as features for learning, i.e. if an algorithm is able to
separate the classes in the feature space based on that representation. Figure 4.2 gives
us an intuition about the behaviour of centrality of nodes across the graph, such as the
eigenvector centrality.
We should also focus on the noisy scenario. This would let us analyze how robust
those features are and how they react in the presence of noise. We simulate some salt
noise in the image as depicted in Figure 4.2. Notice that although such a noise could be
easily removed by using basic image processing techniques such as the median filter, what
we are doing here is changing the topology of the graph by removing nodes. Remember
that as soon as we build our graph, we are no longer working on the image domain,
but on the graph one, and in this domain a node removal causes a topological change.
Therefore, we would like to see how robust each centrality is when the topology of the
graph changes.
4.3.2 Histogram feature
A feature vector could be built using the histogram of centrality values. A histogram
accumulates, in a predefined number of bins, how many values arise within a certain
range according to the bins to be grouped. The salt noise added to the graph causes a
topological change in which nodes are removed. We would like to evaluate the impact of
that change on the histogram of centrality values.
Figure 4.3 shows the histogram of the fish1 shape according to the betweenness
centrality. This image shows the histograms of the original images (displayed in Figure 4.2)
and the noisy ones. The betweenness centrality has a high peak in the zero centrality
value, which means that majority of points have the lowest importance in the graph.
When the noise was added, this peak slightly reduced, but the distribution remained
similar to the original image. An interesting aspect is that for many shapes, this centrality
value will have similar distribution. The location of the high betweenness values is the
most discriminant feature in the shape, which cannot be encoded in the simple histogram
representation. The betweenness centrality has apparently the same behaviour with or
without noise. We can see that the betweenness centrality has a long-tail distribution,
there is a spike in the zero-betweenness value (the red region in Figure 4.2) and a fast
decay for higher values, this means that there will be few points with high betweenness
values. When we look at the noisy image, we see small changes on high betweenness
nodes when compared to the noiseless case. The location of the “holes” in the graph
46
Fish1 − Betweenness
Betweenness
F
re
q
u
e
n
c
y
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
3
0
0
0
Noisy Fish1 − Betwenness
Betweenness
F
re
q
u
e
n
c
y
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
3
0
0
0
Figure 4.3: Comparison of betweenness centrality histograms
might affect the overall betweenness distribution, especially if those nodes used to lie on
the geodesic paths of other points. This centrality seems to be stable to noise, although
not very discriminative using this simple histogram modeling for the feature.
Figure 4.4 shows the histogram of the fish1 shape according to the closeness centrality.
We perform a density estimation of the histogram to help visualize the distribution before
and after the noise was added (the red line). We notice that the distribution remains
very similar. In fact, in the experimental section, we will see that the closeness centrality
was the most robust measure against the type of noise we added. The closeness centrality
shows promising results and it seems to be robust to the presence of noise. Although we
can see some differences in both histograms, the distribution remained very similar with
respect to the noiseless case. We will show later quantitative experiments to support this
observation. Remember that this centrality shows how close a node is to all the other
nodes in the graph. When a single node is missing on a shortest path, there will probably
be other shortest paths with the same length as alternative routes, and the distribution
will remain stable over this type of noise.
The addition of noise in the graph had a strong impact on the distribution of the
degree node. Figure 4.5 shows first the histogram of the original degree distribution.
As we have closed shapes in the images, all points inside the shape will have the same
degree value, i.e. the number of neighbors used in the graph construction. When the
noise was added, points inside the shape will have smaller degree values, and the high
peak visualized in the first image is spread across several other bins. We first notice a
significant change in the distribution of the degree centrality. The degree will be defined
based on the type of connections using during the creation of the graph (an 8-connected
47
Fish1 − Closeness
Closeness
F
re
q
u
e
n
c
y
0.110 0.115 0.120 0.125 0.130 0.135
0
5
0
1
0
0
1
5
0
2
0
0
Noisy Fish1 − Closeness
Closeness
F
re
q
u
e
n
c
y
0.110 0.115 0.120 0.125 0.130 0.135
0
5
0
1
0
0
1
5
0
2
0
0
Figure 4.4: Comparison of closeness centrality histograms
Fish1 − Degree
Degree
F
re
q
u
e
n
c
y
0 2 4 6 8
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
3
0
0
0
3
5
0
0
Noisy Fish1 − Degree
Degree
F
re
q
u
e
n
c
y
0 2 4 6 8
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
3
0
0
0
3
5
0
0
Figure 4.5: Comparison of degree centrality histograms
48
neighborhood in this example). In such a representation pixels might lie inside the shape
or in the border, the degree distribution would have a spike on the number of neighbors
used, e.g. all pixels inside the shape will have degree 8. A pixel would only have a
value different than 8 when it lies in the border. Notice also that since there are no
disconnected nodes, no pixel will have a 0 degree. When the salt noise is added, the
degree distribution changes significantly, since a single “hole” inside the shape will affect
directly 8 nodes. Therefore, trying to match shapes using the degree centrality might
end up not being an easy task in the presence of noise.
Fish1 − Eigenvector
Degree
F
re
q
u
e
n
c
y
0.0 0.2 0.4 0.6 0.8 1.0
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
3
0
0
0
Noisy Fish1 − Eigenvector
Eigenvector
F
re
q
u
e
n
c
y
0.0 0.2 0.4 0.6 0.8 1.0
0
5
0
0
1
0
0
0
1
5
0
0
2
0
0
0
2
5
0
0
3
0
0
0
Figure 4.6: Comparison of eigenvector centrality histograms
Finally, we display the results for the eigenvector centrality in Figure 4.6. The
eigenvector centrality had an interesting behavior in the noiseless case, it had a noticeable
gradient varying from the object’s center of mass towards the periphery. We also notice
this pattern in the histogram of the eigenvector centrality. Although it is a long tail
distribution, it spreads across all centrality values instead of having a single spike on the
zero value (when compared to the betweenness centrality). When the noise was added,
the distribution was flattened and more nodes were assigned the zero value. We can also
notice a small spike in some random regions of the shape. What we observe is that the
spike was located in the biggest closed region without noise inside the shape.
4.3.3 Random Binary Comparison of Centrality Values
One of the biggest drawbacks of representing a shape using only the histogram values is
that we lose spatial information on where the high centrality values are. We only know
they exist by their presence in the histogram, but we do not know where they are located.
49
One way to bring spatial information into our feature modeling is to use all image
values as a feature vector as described in the introduction of this chapter. The problem
with this kind of representation is that the algorithm will be highly biased on the training
data and it will be difficult to generalize for unseen images. A powerful approach considers
random binary comparisons of distinguishable locations. Querying key locations for
comparison might be powerful enough to split the data correctly, such as in the work of
Lepetit and Fua [70]. We do not wish to overfit the training data by using all information
we have, but to learn the best way to split our centrality-based feature space. One might
ask questions such: (i) Which locations should be chosen for the binary comparison?
(ii) How many locations are necessary? In the approach of Lepetit and Fua [70] and
Shotton et al. [114], they generate random locations and at each location they evaluate
the information gain, by minimizing the entropy of leaf nodes. Therefore, they choose
the locations which succeed to maximize the gain.
a
b
center
x
y
x
y
center
a) v(x) ≥ v(y) b) v(x) < v(y)
Figure 4.7: Binary Comparison of Centrality Values. The centrality value v(x) ≥ v(y) in
figure a) and it is v(x) < v(y) in b). Therefore, we could encode feature a as a quintuple
<dwx, dhx, dwy, dhy, bool> in which dwx, dhy stand for the width and height of a point
x with respect to the center of the shape, the same information is stored for the point
y, and we have a boolean value which indicates if the centrality of x is higher than the
centrality of y.
Let us consider a simple example. There are three classes: a,b, c, and two images per
class. We randomly choose locations for test x1 = (p,q) such that 0 ≤ p < width, 0 ≤
q < height and x2 = (r, s) such that 0 ≤ r < width, 0 ≤ s < height and we compare
the centrality values of all images using those locations. For each location, we split the
images into two sets, the ones who fulfill the condition (v(p) ≥ v(q)) and the ones which
did not. The same applies for location x2. After testing x1 in all images, we obtain
left1 = [a,a,b, c] and right1 = [b, c]. Nevertheless, when we apply the test for location
x2, we obtain left1 = [a,a] and right1 = [b,b, c, c]. We can calculate the entropy values
for all sets and choose the test location which minimizes the entropy. For instance, by
using location x2, we are able to distinguish class a from the other ones. Therefore, we
would choose x2 for this level, and later repeat the process until we are able to split
classes b and c. Therefore, the leaf nodes of the binary tree would hold a posterior
50
probability for each class, e.g. the probability of finding a class b in the right node of x1
is 25%. There are many variants of this method and for a deeper understanding, the
reader can refer to [114, 70].
Figure 4.7 shows two features using binary comparison. The idea consists of con-
sidering a certain reference point in the shape, e.g. the center of mass of the shape,
the center of the window, or even the center of the graph, i.e. the node with mini-
mum eccentricity value (Section 2.2.2). After choosing a reference location, we could
compare the centrality values of two other locations with respect to this reference. We
call a boolean comparison because we are only interested to see if the location x has a
higher centrality value than location y. Therefore, the feature representation would be
f(x,y) =<dwx, dhx, dwy, dhy, bool>, where dwx, dhy stands for the distance that x is
from the reference point and the same for y. The boolean value tells if the centrality of
x is higher than y or not. We have answered question (i), choosing the locations which
maximize the gain by minimizing the entropy. Question (ii) is related to the height (h)
of the tree. A tree which is really deep could easily overfit the dataset. One has to
keep in mind that the number of comparisons will be on the order of 2h, since at each
branching location we create more nodes. The traditional way is to avoid overfitting by
cross-validating that parameter.
4.4 Training Centrality-based Features
Most of the feature vector procedures we have been talking are suitable for the so-
called shallow machine learning. Although the meaning of shallow learning is not a
consensus, the shallow architecture for learning became more evident in contrast to deep
representations [54, 116]. In this work, we will only refer to shallow architectures. There
are many alternative definitions for the concepts being described in this section.
Definition 35 (Shallow Learning). A shallow architecture is a learning procedure which
models the problem in one level of representation (feature vector) which is then used for
training.
This definition means that we encode all possible information for our problem in
just a single feature vector and hope that this feature vector is discriminative enough to
separate our classes. The learning algorithm will be trained using those features. On a
deep architecture, such as convolutional neural networks [68], there are several layers of
abstractions built to represent the problem.
Definition 36 (Training Algorithm for Classification). An algorithm which can minimize
a cost function based on labeled features (experience) to decide upon the class (prediction)
of a certain feature (representation) is called a training algorithm.
There is a vast literature on learning algorithms based on shallow architectures, such
as the Naive Bayes Classifier, Nearest Neighbors classifier, SVM[23], and so on. We have
sketched a shallow learning procedure based on centrality in Algorithm 4.1. The input for
51
Algorithm 4.1: Training Algorithm
Data: Images I and labels L
Result: A trained algorithm T
1 F ← {}
2 for i ∈ I do
3 G(V, E)← build_graph(i);
4 C ← centrality(G(V, E));
5 f ←build_feature(C);
6 F ← F ∪ {f} ;
7 end
8 T ←train(F,L);
the algorithm consists of the images and their respective labels. For each image, we build
a graph representation of it (Line 3), and later calculate the centrality (Line 4). Based
on the image and the centrality values for the nodes, we build a feature representation
(histogram, patches, etc.). When all features have been prepared for learning, we train
the algorithm (Line 8).
The training part of a learning algorithm is the stage which takes a considerable
amount of time. It depends on the algorithm, on the amount of data, and number
of classes. The testing procedure is usually a fast procedure, which uses the trained
algorithm to estimate the class label. After training the algorithm, we would simply
classify a certain feature using that algorithm. The input is the image we would like to
classify and the algorithm we trained in the previous step. From lines 1 to 3 we build the
same feature representation as we did for the training algorithm, but now the algorithm
only predicts the class for this feature. In online learning approaches, the classifier could
be modified or updated based on the new data.
4.5 Experiments
We have performed experiments on Kimia’s 99 dataset [112] using the training procedure
described in Algorithm 4.1. We build a graph for each image and compute the feature
vectors using the centrality measures. The feature modeling is based on the histogram
described in Section 4.3.2.
Our hypothesis we would like to evaluate is: if and which centrality measures can
be used (as features) in ML for shape matching tasks under the presence of topological
changes (noise). We have four different versions of the dataset, the noiseless case and
datasets under different noise levels: 1%, 5%, and 10%. We build five different classifiers
using graph centralities. Four classifiers are computed using each individual centrality
histogram and one classifier using the combination of all of them. In these experiments,
we grouped the values into 40 bins.
In ML, if one classifies the set which was used for training, the results would not be
meaningful since they could easily overfit the trained set. Although this measure is not
52
meaningful to assess how good a classifier is, it is useful to find out how discriminative
the centrality-based feature is. For instance, if two different objects a ∈ A and b ∈ B
generate the same feature representation A but belong to two different classes, no matter
which training algorithm is used, there will be a misclassification. The algorithm will
make a decision for either classes A or B. This happened due to poor representational
power of the feature. This is what we aim when classify the same dataset. Nevertheless,
we also evaluate how good they are with respect to images never seen before, such as the
noisy ones, in which the algorithms are unaware of the topological change. In this way,
we can find out which centrality measure could be a good feature representation when
using the histogram.
Figure 4.8 shows the confusion matrices of the classification results for classifiers all
(A), betweenness (B), closeness (C), degree (D), and eigenvector (E). The first colunm
(Figures 4.8a, 4.8e, 4.8i, 4.8m, and 4.8q) shows the results on the same trained dataset.
These are the images in which we would expect a perfect confusion matrix, i.e. an identity
matrix. The first row shows the results when all centrality measures were used. As
expected, the trained algorithm obtained almost correct classification for all classes in
the trained dataset (Figure 4.8a). We see that there are still some images misclassified.
As soon as the classifier A faces the datasets with noise, the results start obtaining
worst performance, for the airplane and dude classes, there was not a single correct
classification. Nevertheless, classes with elongated structures had correct classification
results with high accuracy (91% for wrench and 100% shark) under 10% of noise.
The second row shows the results for the betweenness centrality. We see the histogram
of the betweenness centrality does not seem to be a robust representation. Most nodes have
the degree equal to zero, as we expected when we plot the distribution of centrality over the
shape and the histogram. In the noiseless case, Figure e), we see that some are misclassified,
and as the noise increases, this centrality starts showing strong misclassification results.
The classes hand and rabbit, and dog and stingray had wrong results. We believe that
this centrality is robust to noise, but the histogram of it is not. A more powerful result
could be obtained if the location of the high peaks of betweenness values in the shape
were stored in the feature. Referring to Figure 4.2, we can see that the high peaks were
stable under noise.
The centrality representation which obtained the best results was the closeness
centrality, as seen in Figures 4.8i, 4.8j, 4.8k, 4.8l. Figure 4.8l shows that the confusion
matrix still looks like an identity one even under the presence of heavy noise. The reason
for this is that when nodes were randomly deleted, there were still alternative paths
whose costs were not much higher than the original case. This centrality shows the most
robust results under this scenario. The scenario in which we would expect poor results
of the closeness centrality would be one in which many adjacent nodes would vanish
together, since this would change abruptly the cost of the shortest paths in the graph.
The fourth row shows the degree centrality, which obtained the worst results in this
experiment. The reason for that is simple, the distribution had previously a single peak
in the number of neighbors used, but when the noise appeared, other peaks started
appearing in the histogram, and this classifier was not able to correctly classify under
53
such a change. Under 10% of noise, almost all images were classified as belonging to the
shark class.
The results of classification are summarized in Table 4.1. First column displays the
classifier evaluated. Second column (no noise) displays the result of the classification on
the training data, which shows that the classifier using all centralities obtained the best
results (98% of correctness). Most of the individual centralities were able to perform well
(with at least 81% of correctness). However, when 1% of noise is added, the closeness
centrality starts to obtain the best performance. The other centrality measures are
more sensitive to noise than closeness and their results drop remarkably: the addition
of random noise had the highest impact on the results of degree and eigenvector. We
observe that closeness centrality remains stable even though there are missing nodes in
the graph. Even when 10% of noise is added, it still obtains better results than other
centralities on the training data.
classifier No noise 1% noise 5% noise 10% noise
All 0.98 0.85 0.65 0.46
betweenness 0.82 0.67 0.51 0.29
closeness 0.94 0.93 0.92 0.83
eigenvector 0.89 0.69 0.27 0.26
degree 0.81 0.37 0.17 0.14
Table 4.1: Classification results using different centrality measures. The closeness
centrality achieved the best results.
4.6 Conclusions
It can be concluded for Kimia’s 99 dataset and the histogram feature representation that
the closeness centrality is the most robust centrality measure for shape matching using
the histogram representation and naive Bayes classifier. The simple combination of all
centralities did not obtain the best result but it was still considerably better than the
other isolated centralities. Furthermore, the performance of the degree centrality changes
severely as under “salt” noise. This happens due to the fact that the vertices inside the
shape have degrees smaller than the number of neighbors used. This type of noise was
chosen due to the topological changes it causes in the graphs. The histogram distribution
changes significantly for some centrality values.
Notice, however, that these experiments only evaluate how good they were in combi-
nation to the histogram representation. When the location is incorporated to the feature
representation, those results could change and for instance, we would expect that the
betweenness centrality would be able to represent the data well even when the topology
of the graph changed. We will apply the random binary comparison in our future work
(Section 8).
54
0
0.7
1
0.5
0.2
(a) A - 0% (b) A - 1% (c) A - 5% (d) A - 10%
0
0.7
1
0.5
0.2
(e) B - 0% (f) B - 1% (g) B - 5% (h) B - 10%
0
0.7
1
0.5
0.2
(i) C - 0% (j) C - 1% (k) C - 5% noise (l) C - 10%
0
0.7
1
0.5
0.2
(m) D - 0% (n) D - 1% (o) D - 5% noise (p) D - 10%
0
0.7
1
0.5
0.2
(q) E - 0% (r) E - 1% (s) E - 5% noise (t) E - 10%
Figure 4.8: Confusion Matrices for the Centrality-based Learning.
55

CHAPTER 5
Graph-based Point Drift
One of the first problems discussed in this dissertation was the registration of point-sets.
We illustrated in Chapter 3 the importance of such problem with the example of finding
the correct key to open a door. It is time for us to address the registration of point-sets
using graph centralities (introduced in Chapter 2). The centralities will play an important
role on the estimation of correspondence, which is the step assigning the probability of
how likely a node is to be matched to another node.
The results of this chapter were published in the Pattern Recognition journal by
Elsevier, 2015 [27]. Elsevier has granted us permission to reuse the full content of
the paper, including text and images, in this dissertation. The license is available in
Appendix A.
5.1 Computing Probabilities using Centralities
Let us first build the intuition for registering points using centralities. Given a point-set
X and its associated graph X , we shall refer to x as a point belonging to both X and X ,
hence, assuming a bijection X ↔ X. We shall first triangulate a point-set to obtain a
graph (the data-graph). Chapters 6 and 7 will deal with techniques for creating data
graphs out of point-sets. We color code the graph of Fig. 5.1a to display the centrality
value of each node. The closeness centrality c(xn) of a point xn ∈ X is chosen for this
example. Blue nodes represent points with low centrality values and pink nodes represent
the ones which are more central to graph according to the closeness. Let us also compute
the histogram of those values (Fig. 5.2a) to find out how diverse the distribution of
centrality is across the graph.
When looking at a particular bin of the histogram, we notice that many nodes have
the same centrality value and there are many non empty bins. This indicates that the
distribution is spread across a wide range of centrality values instead of having a single
peak (such as the degree centrality described in Chapter 4). We seek to find candidates
57
y  , 1 < m < M x 0 0.015 0.0310.50
(a) (b) (c)
nm
Figure 5.1: (a): Graph X built using Delaunay triangulation color-coded using the
closeness centrality. The pinkish the node is, the more central it is. (b): Node xn
chosen for calculating the probability of being matched with respect to the centrality
of the others. (c): Probability Pg(m|xn) of xn being matched as in Eq. 5.1. Finally,∑N
n=1 Pg(m|xn) = 1. Reprinted from Pattern Recognition, 48/ 2, Samuel de Sousa and
Walter G. Kropatsch, Graph-based point drift: Graph centrality on the registration of
point-sets, Pages 368–379, Copyright 2015, with permission from Elsevier.
to match our point xn (the pink node displayed in Fig. 5.1b). If we are about to match
nodes based solely on the centrality, we would believe that a node whose centrality value
is similar to the one we seek is more likely to be the correct node and all the other nodes
should be rejected. This seems to be a strong assumption, since we would ignore all
the points whose centrality are not exactly the same one as xn. Instead of making such
a hard decision, we could assume that the chances of finding the correct point will be
higher if the point has the same centrality as our point xn and it will be lower if this
centrality is not the same. How this probability will vary is up to us. This intuitive
assumption does not discard points, but it makes it highly unlikely to be matched the
more different the centrality values are.
One way to model the decay of importance for matching is to center a Gaussian
distribution (or any other unimodal distribution) in the location which has the highest
likelihood of being the correct match (i.e. the nodes have the same centrality values as the
node xn). Points whose centralities are higher or lower shall receive a lower probability
of being matched. Thus, the probability of ym being matched to xn could be defined as:
58
P (m|xn) =
hcm exp
(
−‖v(ym)− v(xn)‖2
2φ2
)
M∑
k=1
hck exp
(
−‖v(yk)− v(xn)‖2
2φ2
) , (5.1)
in which hcm stands for the histogram bin associated with the centrality v(ym) of node
ym indicating how many nodes were found in that graph whose centrality is the same as
ym. Intuitively, the nodes of the graph whose centrality values are closer to the centrality
of xm would have a higher probability of being matched. In fact, Eq. 5.1 is similar to the
one proposed by Carcassoni and Hancock [19, Eq. 16]. The authors employ eigenvectors
of a proximity matrix H with different weighting functions. If we apply the eigenvector
centrality as v(xn) for Eq. 5.1, we compute the difference between the elements of the
eigenvector of the adjacency matrix, not of a proximity matrix created with a specific
weighting function.
0
0.2
0.4
0.6
0.8
1
(b)(a)
Figure 5.2: (a): The histogram of the closeness centrality of the nodes displayed in Fig.
5.1 (a) overlapped with a Gaussian distribution centered at xn. (b) The probability of
the nodes P (m|xn) as computed in Eq. 5.1. Reprinted from Pattern Recognition, 48/ 2,
Samuel de Sousa and Walter G. Kropatsch, Graph-based point drift: Graph centrality
on the registration of point-sets, Pages 368–379, Copyright 2015, with permission from
Elsevier.
The probability of a node ym being matched to node xn is displayed in Fig. 5.1c. We
can see that many nodes are assigned the blue color, meaning that they have a lower
chance of being matched to xn due to the fact that their centrality diverges significantly.
The intuition of Eq. 5.1 is to combine a Gaussian distribution with the one estimated
by the histogram to enhance the probability of nodes whose values are closer to the one
we would like to match. Fig. 5.2a shows the histogram of the centrality distribution in
Fig. 5.1a while the final probability is displayed in Fig. 5.2b. The color scheme displays
the importance of the node according to the closeness centrality. Pink nodes have
the highest closeness centrality (normalized to between zero and one). The Gaussian
distribution is centered at the centrality value of point xn and φ is the variance of the
59
centralities in X which controls how strong the influence of the Gaussian is. The area
under the curve adds up to one. We notice that based only on the centrality, one could
find the correct match in the first ten more likely nodes, the centrality itself provides a
good prior for pruning unfeasible nodes.
5.2 Combining Centrality and Spatial Information
Although our probability estimation produced good results in the previous section (i.e. it
gave a high probability to the correct point), it is clear that we can improve Eq. 5.1 if we
integrate the spatial information as well. There are many nodes with similar centralities
which turns the matching process into a difficult task if we only rely on the centrality
and since the spatial information is available, there is no reason for not doing so.
The centrality estimation helped pruning nodes that do not represent good matches,
by integrating it to CPD, we expect that the algorithm will converge faster. First of all,
we define the combination of centrality and spatial information in Eq. 5.2:
ψ(x,m) = −
(
‖x− ym‖
2
2σ2 + ‖v(x)− v(ym)‖2
2φ2
)
, (5.2)
where σ2 is the variance associated with the locations of the points and φ2 the variance
associated with the centrality values. The probability p(x|m) takes the form of:
p(x|m) = hcm exp(ψ(x,m))
(2πσ2)D/2 , (5.3)
Myronenko and Song [86, Eq. 1] define p(x) as:
p(x) =
M+1∑
m=1
p(m)p(x|m), (5.4)
note that the sum goes up to M + 1 as they introduce a term p(M + 1) to account for
outliers, as well as a parameter w which weighs between the two terms. Therefore, we
can rewrite p(M + 1) = w
N and p(m) = 1−w
M : 1 ≤ m ≤M and decompose the sum into
two parts:
p(x) = w
1
N
+ (1− w)
M∑
m=1
1
M
p(x|m). (5.5)
Our centrality-based probability estimation (Pg(m|x)) is called Graph-based Point Drift
(GPD) and we start the derivation from Bayes’ theorem:
60
Pg(m|xn) = p(m)p(xn|m)
p(xn) , (5.6)
Pg(m|xn) =
(1− w)
M
hcm exp (ψ(xn,m))
(2πσ2)D/2
w
1
N
+ (1− w)
M∑
k=1
1
M
hck exp(ψ(xn, k))
(2πσ2)D/2
, (5.7)
multiplying by NM(2πσ2)D/2:
Pg(m|xn) = N(1− w)hcm exp(ψ(xn,m))
wM(2πσ2)D/2 +N(1− w)
M∑
k=1
hck exp(ψ(xn, k))
, (5.8)
dividing by N(1− w):
Pg(m|xn) = hcm exp(ψ(x,m))
wM(2πσ2)D/2
N(1− w) +
M∑
k=1
hck exp(ψ(xn, k))
, (5.9)
as CPD first did in Eq. 3.10, we also call c = wM(2πσ2)D/2
N(1−w) . Therefore, the probability
function which considers both the spatial and centrality information is computed as:
Pg(m|xn) = hcm exp(ψ(xn,m))
M∑
k=1
hck exp(ψ(xn, k)) + c
. (5.10)
Each centrality measure can be applied in v(xn) for computing the probabilities. We
evaluate the performance of the different measures in our experiments. We use the
subscript g in Pg(m|xn) to differentiate between the probability Pc(m|xn) originally
proposed by CPD and our new probability estimation combining the centrality values.
Eq. 5.10 states that the centrality will either reward or penalize the connection between
ym and xn based on how similar (or divergent) their centrality values are.
5.3 Relation between GPD and CPD
In the previous section, we have provided a way to integrate the centralities in the
probability estimation of CPD. Theorem 1 helps us better understand how GPD is
related to CPD.
Theorem 1 (Relation between GPD and CPD). The centrality values of either a complete
graph or a disconnected graph do not improve CPD.
61
Let XN and YN be complete graphs with N nodes associated with point-sets X and Y
respectively. CPD treats Y as GMM centroids and X as points generated by the GMM,
where m = (1 . . . |Y|) stands for the GMM components. Theorem 1 states that the GPD
posterior for the GMM components of the complete graph Pg(m|xn) is equivalent to the
GPD posterior of the disconnected graph Pg(m|xn) and to the CPD posterior Pc(m|xn)
which only uses the point-sets, where ym ∈ YN , xn ∈ XN , ym ∈ Y, and xn ∈ X. Thus,
it establishes a relation between CPD and GPD. In order to prove Theorem 1, we need to
show that for a complete graph XN , all nodes have the same centrality value (Lemma 1).
Lemma 1 (Centrality values of a complete graph). Centralities v(x) = v(y) for all
{x,y} ∈ XN .
Proof of Lemma 1. The degree centrality states that all d(x) = N−1 for any node x ∈ XN ,
thus, fulfilling Lemma 1. The closeness centrality is calculated as c(x) = ∑ 1
d(x,y) , for
y ∈ XN\{x}, where d(x,y) is the shortest distance between nodes x and y. As all
the nodes are connected, d(x,y) = 1 and the centrality c(x) will be the same for all
nodes in the graph. For the eigenvector centrality, let us first define J as a 1 matrix
of dimension N , the rank of J is 1 with spectrum N1, 0N−1. The adjacency matrix A
of XN is A = J − I, where I is the identity matrix. Considering that the spectrum of
I is 1N , the spectrum of XN is (N − 1)1 and (−1)N−1. In order to fulfill the condition
Aw = λw for the eigenvalue λ = N − 1, the eigenvector w associated with it needs to be
an 1-vector. Therefore, for any node x ∈ XN , the centrality value e(x) = 1.
Considering all centrality measures which fulfill Lemma 1, we can now prove Theo-
rem 1.
Proof of Theorem 1. Due to the fact that XN and YN are complete graphs, Lemma 1
states that v(xk) = v(xi), for all {xk,xi} ∈ XN and the histogram value hmc = N , since
all centrality values are the same, thus, they will be in the same bin. Without loss of
generality, the same holds for YN . The exponential of two sums is equal to the product
of exponential terms. Therefore, we can split the ψ function (Eq. 5.2) into two terms
and plug those previous values into Eq. 5.10:
Pg(m|xn) =
N exp
(
−1
2
∥∥∥∥xn − T (ym, θ)
σ
∥∥∥∥2)
exp
(
−1
2
∥∥∥∥v(xn)− v(ym)
φ
∥∥∥∥2)
M∑
k=1
N exp
(
−1
2
∥∥∥∥xn − T (yk, θ)
σ
∥∥∥∥2)
exp
(
−1
2
∥∥∥∥v(xn)− v(yk)
φ
∥∥∥∥2)
+ c
,
(5.11)
since v(xn) = v(ym):
Pg(m|xn) =
N exp
(
−1
2
∥∥∥∥xn − T (ym, θ)
σ
∥∥∥∥2)
exp(0)
M∑
k=1
N exp
(
−1
2
∥∥∥∥xn − T (yk, θ)
σ
∥∥∥∥2)
exp(0) + c
, (5.12)
62
and we arrive at:
Pg(m|xn) =
exp
(
−1
2
∥∥∥∥xn − T (ym, θ)
σ
∥∥∥∥2)
M∑
k=1
exp
(
−1
2
∥∥∥∥xn − T (yk, θ)
σ
∥∥∥∥2)
+ c
, (5.13)
showing that the probability Eq. 5.13 is equivalent to Eq. 3.10. When using disconnected
graphs in GPD, the content inside the exponential function will be zero1 and the same
results would be obtained.
Theorem 1 indicates that neither a completely disconnected graph X (just a point-set)
nor a complete graph XN could improve CPD. Any improvement could only be obtained
if the graph is neither empty nor complete but when the edges play the important role of
resembling the actual structure of the object being matched.
5.4 Experiments
Our experimental section aims at evaluating the behaviour of different measures of
centralities in different types of data using our probabilistic estimation. We also compare
our results with the original CPD algorithm. We perform two types of experiments:
1. Registration under different types of transformations (as explained in Chapter 3).
a Similarity transformation without noise.
b Affine transformation with missing points.
c Non-rigid transformation with noise and missing points.
2. Registration using point-sets of different classes.
a Clusters.
b Points randomly distributed.
c Points normally distributed.
d Silhouette of objects.
e Points sampled from a grid.
The experiments we designed have different goals: the first one aims at assessing how
fast each centrality is when we vary the type of transformation. Remember when we
centered the Gaussian in the histogram, the probability was pruning nodes which could
not represent good matches. We also referred to noise in the first type of experiment. We
1Assuming that centrality of disconnected nodes are either zero or a constant but not infinity, e.g.
Freeman [43] considers the distance between disconnected nodes as infinity, hence, we could consider that
the closeness c(xn) =
∑
1/∞→ 0. Freeman [42] also defines the betweenness of disconnected nodes as 0.
63
define noise as spurious points which do not belong to the original dataset and by missing
points, we mean that true points of the dataset will not be present. For the second type
of experiments, we evaluate 495 databases consisting of 5 classes enumerated before.
Those classes of point-sets have different properties, such as the sampling functions, the
distributions applied for drawing points, or points lying on the silhouette of objects. The
aim of the second part is to isolate certain properties of the point-sets which could help
to understand the behaviour or isolate the behaviour of the individual centrality. When
we studied the centrality based learning in Chapter 4, we observed different behavior
for the datasets. Therefore, to some extent, one could predict which centrality is more
advantageous given some knowledge on the type of point-set being registered.
5.4.1 Registration under Similarity Transformation
Figure 5.4 shows the registration of the fish2 dataset with 91 points under similarity
transformation. The blue circles belong the point-set Y being aligned at each iteration
towards the point-set X in red stars. The rows display the 3rd, 6th, and 9th iteration of
registration. Closeness and eigenvector variants obtained good results at the 6th iteration,
which are better when compared with CPD at the 9th iteration.
5 10 15 200
1
2
3
4
(a) Rigid
‖X
−
T
(Y
)‖
Betweenness
Closeness
Degree
Eigenvector
CPD
Figure 5.3: Convergence of the registration under similarity transformation. All centrality
values obtained a faster decay than the original CPD algorithm. Reprinted from Pattern
Recognition, 48/ 2, Samuel de Sousa and Walter G. Kropatsch, Graph-based point drift:
Graph centrality on the registration of point-sets, Pages 368–379, Copyright 2015, with
permission from Elsevier.
2http://sites.google.com/site/myronenko/research/cpd
64
3rd)
6th)
9th)
Betweenness Closeness Degree Eigenvector CPD
Figure 5.4: Registration of the fish point-set under similarity transformation. The first
row shows results at the 3rd iteration, second row at 6th iteration, and third row at 9th
iteration. Reprinted from Pattern Recognition, 48/ 2, Samuel de Sousa and Walter G.
Kropatsch, Graph-based point drift: Graph centrality on the registration of point-sets,
Pages 368–379, Copyright 2015, with permission from Elsevier.
Our GPD variant based on the closeness centrality converges at the 9th iteration while
CPD needs more iterations to finish the registration. We consider the alignment error as
the `2 norm between the original point set X and the point set Y after the recovered
transformation: ‖X− T (Y)‖2. Figure 5.3 shows the alignment error with respect to the
number of iterations. The lower the curve decay, the faster the registration. The curve
should approach zero in the ideal case. Observing the curves, the GPD variants using
closeness and eigenvector centralities obtained the quickest decay, while the degree and
betweenness decayed more slowly compared with them, although all centrality variants
are still faster than CPD.
The fast alignment using centrality indicates that this topological information might
help the algorithm converge faster. As we expected before when we derived the prob-
abilistic estimation, the algorithm was pruning unfeasible points (based on how wide
the Gaussian is). The algorithm was able to register faster in this similarity example.
Nevertheless, if such a prior is not correct, i.e. we end up pruning points which should not
be pruned, the algorithm might fail to converge. For now, we notice that under similarity
registration (rigid + isotropic scale) without noise, our centrality variant performed faster
than CPD in the number of iterations. In the next sections, we shall vary the type of
registration we perform.
65
5.4.2 Registration under Affine Transformation with Missing Points
The previous experiments were based on a 2D dataset. Now, we will compare the
centralities on a 3D face dataset3 with 392 points under an affine transformation and
missing points. Each data set has different missing points which makes the registration
more challenging.
5th)
10th)
15th)
Betweenness Closeness Degree Eigenvector CPD
Figure 5.5: Registration of 3D point sets under affine transformation with missing points.
First row shows results at the 5th iteration, second row at 10th iteration, and third row
at 15th iteration.
Fig. 5.5 shows the registration of the datasets. Notice in the last row that some
blue points are missing in the left part of the face (only red points exist). The same
happens on the other side of the face (missing red points and only blue points exist).
Although it is harder to visualize how well registered the points are, we see that closeness
centrality was able to align the point-sets well in the third row, while CPD needs some
more iterations to finish the registration.
Fig. 5.6 displays the alignment error in the same way we did for the similarity example.
In this example, some centralities decayed faster than CPD. Looking at the curves at
iteration 10, many centralities had lower error than CPD, while CPD reaches this error
around iteration 20, therefore, taking twice as many iterations to reach that error level.
This is a more challenging example than the fish dataset, due to the higher number of
dimensions but also due to the missing points.
3http://sites.google.com/site/myronenko/research/cpd
66
5 10 15 20 25
2
4
6
8
10
12
(b) Affine + Missing Points
Betweenness
Closeness
Degree
Eigenvector
CPD
Figure 5.6: Convergence of the Affine Registration. Many centralities approached the
final error at the 10th iteration while CPD reached that value around 20th iteration.
Reprinted from Pattern Recognition, 48/ 2, Samuel de Sousa and Walter G. Kropatsch,
Graph-based point drift: Graph centrality on the registration of point-sets, Pages 368–379,
Copyright 2015, with permission from Elsevier.
5.4.3 Non-Rigid Registration with Noise and Missing Points
Finally, the last scenario for the first batch of experiments is the non-rigid case with
spurious nodes and missing points. Fig. 5.7 shows again the fish dataset under such
type of transformation. One of the fishes does not have the dorsal fin, which is a very
distinguishable part which could help the registration. This type of transformation is
particularly difficult for all algorithms and the decay is presented in Fig. 5.8.
We notice in this experiment that most centrality values performed poorly compared
with CPD, except the closeness centrality. The closeness centrality had initially a faster
decay than CPD, until 20th iteration when CPD takes over, and later at 50th iteration,
closeness takes the lead again. Betweenness and degree centrality did not converge. In
fact, the degree incorrectly matched the dorsal fin of one fish to the caudal fin of the
other. Such deformation would not be possible in the rigid case, but in this example we
allowed non-rigid deformations. The cause for failure of the degree centrality is due to
the fact that the prior provides wrong information which is highly associated with the
data-graph.
Centralities are defined on a graph data structure. Therefore, the way the graph
is built plays an essential role for the success of registration. When the Delaunay
triangulation assigns many edges to a noisy point, this node will become more central
and therefore, it will jeopardize the registration. We will come back to this issue later in
67
10th)
20th)
50th)
Betweenness Closeness Degree Eigenvector CPD
Figure 5.7: Registration of two 2D point sets under non-rigid transformation with outliers
and missing points. Reprinted from Pattern Recognition, 48/ 2, Samuel de Sousa and
Walter G. Kropatsch, Graph-based point drift: Graph centrality on the registration of
point-sets, Pages 368–379, Copyright 2015, with permission from Elsevier.
the next sections when discuss about data-graph creation.
5.4.4 Discussion
After those first three initial examples of registration, we can gather some intuition on
the behaviour of centrality-based registration. The success and quality of the registration
is associated with several aspects:
• The type of centrality measure,
• The technique for building the data-graph, and
• The peculiarities of each point-set, such as elongated structures, sampling, density,
etc.
We already observed in Chapter 4, when we performed the centrality-based learning,
that the type of centrality played an important role for the classification. This is also
the case in the registration problem, since we could briefly observe that the degree
obtained the worst performance and the closeness the best one. We applied the Delaunay
triangulation for building the data-graph. Hence, we should expect that sets with noisy
and missing points will not have significantly better performance, or even worse, as the
68
10 20 30 40 50 600
1
2
3
4
(c) Non-rigid + Noise + Missing Points
Betweenness
Closeness
Degree
Eigenvector
CPD
Figure 5.8: Convergence of the non-rigid registration case. Reprinted from Pattern
Recognition, 48/ 2, Samuel de Sousa and Walter G. Kropatsch, Graph-based point drift:
Graph centrality on the registration of point-sets, Pages 368–379, Copyright 2015, with
permission from Elsevier.
prior might be wrong due to the triangulation, it can cause failure in the convergence as
it happened for some centrality values. Figure 5.9 illustrates the problem associated with
the Delaunay triangulation. The first two graphs show possible ambiguous configurations,
but the third one shows an abrupt change in topology by the addition of a single noisy
point.
a
b
c
d
e
f
a
b
c
d
e
f
a
b
c
d
e
f
g
Figure 5.9: Stability of Delaunay Triangulation. A single misplaced node changed the
topology of the graph abruptly. Figure a) and b) show two possible triangulations for the
same point set and Figure c) shows the sensitivity of the triangulation when one node
was added.
69
There are many options for data-graph construction. One could build the Gabriel
Graph [44] to as a data-graph representation. A data-graph could also be built using
a TSP or a MST formulation. An alternative which is closely related to the Delaunay
triangulation is the α-shapes proposed by Edelsbrunner et al. [36], which are collections
of subsets of Delaunay triangulation capturing the notions (fine, crude) of shape in
point-sets. In Chapters 6 and 7 we also propose two different techniques to generate
the data-graph, but it is still an open question to decide which technique is the most
suitable one and we plan to investigate this in our future work (Chapter 8). The reason
for this lies on the fact that the concept of noise in a point-set is context dependent. We
mentioned noise in the fish example because we knew that true points were sampled from
an object’s surface and they should lie on the silhouette. If no information about the
point-set is given and the algorithm is asked to create a graph robust to noise, it will be a
challenging task to decide what noise actually is in order to be able to reduce its impact.
We will be moving towards that question in our future work. Chapter 7 minimizes the
impact of noise on the data-graph in comparison with Delaunay triangulation.
5.4.5 Experiments on Different Classes of Point-sets
We need to perform extensive experiments in the noise free scenario in order to be able to
understand the impact that the class of object has in the registration using centralities,
since we already know that a noisy dataset will not improve the results of the centralities.
We fix the data-graph construction (Delaunay) and noise scenario. Therefore, we can
vary properties of the point-sets and evaluate how those properties impact the results.
We provide five different classes of point-sets. For each class, there are 99 distinct
point-sets and we aim to see if the registration behaves similarly per class and algorithm.
Figure 5.10 shows samples of those five classes: (i) Gaussian clusters, (ii) random points,
(iii) Gaussian sampling, (iv) grid sampling, and (v) contour of objects. The images of
Kimia’s 99 dataset were used to generate the silhouette and the different sampling images.
This is the same dataset used in Chapter 4 for the centrality-based learning.
cluster random normal contour grid
Figure 5.10: Sample point-sets of 495 database used. Each image represent one sample of
each class. Reprinted from Pattern Recognition, 48/ 2, Samuel de Sousa and Walter G.
Kropatsch, Graph-based point drift: Graph centrality on the registration of point-sets,
Pages 368–379, Copyright 2015, with permission from Elsevier.
Tables 5.1 and 5.2 show the average and standard deviation of registration on those
495 point-sets respectively. The results are displayed on the number of iterations until
70
convergence per class and algorithm. Results are calculated using only the registrations
that succeeded to converge. The convergence of each algorithm is displayed in Table 5.3.
On the first type of data, which are clusters, all centralities obtained a faster decay than
CPD. In fact, the closeness centrality was in average 2.66 times faster than CPD. Among
the 99 sets of this class, the best result was approximately 3 times faster than CPD.
We measure the speed of registration by the number of iterations the algorithm took
to converge. When analyzing the standard deviation for both degree and eigenvector
centralities for this class, it is noticeable that they are really high in comparison with
the other algorithms. Also, degree and eigenvector were the only algorithms not able to
converge on all databases for this class.
Database B(µ) C(µ) D(µ) E(µ) CPD(µ)
cluster 42.72 22.72 53.67 41.95 60.56
random 19.94 15.20 30.46 14.56 37.25
normal 36.72 22.08 50.45 37.20 51.54
contour 23.82 17.68 28.49 20.49 26.39
grid 32.60 26.31 31.23 24.71 34.29
Table 5.1: Average on the registration of 495 datasets. Each cell shows the average for
the number of iterations until the algorithm converged. In general, Closeness centrality
obtained the fastest convergence among all algorithms. The fastest algorithm in average
is highlighted in bold. Reprinted from Pattern Recognition, 48/ 2, Samuel de Sousa and
Walter G. Kropatsch, Graph-based point drift: Graph centrality on the registration of
point-sets, Pages 368–379, Copyright 2015, with permission from Elsevier.
Database B(±σ) C(±σ) D(±σ) E(±σ) CPD(±σ)
cluster 4.36 1.26 19.89 17.15 3.91
random 1.94 1.21 15.97 0.96 3.90
normal 9.33 3.37 20.47 18.22 13.82
contour 3.74 3.66 5.03 8.99 4.27
grid 18.39 17.28 21.08 13.41 13.05
Table 5.2: Standard deviation on the registration of 495 datasets. Each cell shows the
standard deviation for the number of iterations until the algorithm converged. In general,
Closeness centrality obtained the fastest convergence among all algorithms. The fastest
algorithm in average is highlighted in bold. Reprinted from Pattern Recognition, 48/ 2,
Samuel de Sousa and Walter G. Kropatsch, Graph-based point drift: Graph centrality
on the registration of point-sets, Pages 368–379, Copyright 2015, with permission from
Elsevier.
71
Database B C D E CPD
clusters 100% 100% 81% 86% 100%
random 100% 100% 55% 100% 100%
normal 100% 100% 75% 86% 100%
contour 100% 100% 90% 95% 100%
grid 95% 69% 35% 77% 100%
Table 5.3: Results on convergence of 495 point-sets under Rigid transformation. Degree
and eigenvector centralities did not succeed to converge on all sets. However, when the
point set comes from grid sampling, no centrality was able to converge 100%. Reprinted
from Pattern Recognition, 48/ 2, Samuel de Sousa and Walter G. Kropatsch, Graph-based
point drift: Graph centrality on the registration of point-sets, Pages 368–379, Copyright
2015, with permission from Elsevier.
For the random class, all centralities obtained faster convergence than CPD, but the
degree was this time the only algorithm that did not converge on all sets. Eigenvector
and closeness centralities were the fastest ones, more than 2 times faster than CPD in
average. The eigenvector algorithm was quite stable for this sort of data since its standard
deviation was quite low compared with the previous class. The next class evaluated is
the contour of an object. We extracted the contour of images from the Kimia dataset in
order to create the contour-based point-set. Closeness centrality was again the fastest
algorithm to converge and we notice that the discrepancy between the results is not as
high as in the other classes. The degree and eigenvector were unable to totally converge
and this time the average convergence time for the degree is higher than CPD.
Finally, we sample the object using grid-based strategy. Notice that inside the object,
all nodes will have the same degree, while they will differ on the boundaries. This was the
only case in which no centrality was able to converge 100%. If one, for instance, create
the graph out of an image using an 8-connected neighborhood, all points inside the object
are likely to be matched, if one considers only the degree. Therefore, when the point set
is grid sampled, it is better to rely on CPD which considers geometry of the points. It
is important to mention that one could add a free parameter (e.g. κA+ (1− κ)B) into
Eq. 5.2 and balance the importance of each term A and B in the equation (geometry
vs. topology). One could always obtain a better or equal results to CPD by crossing
validating κ. In our experiments, we decided not to add such a free parameter to avoid
dependence on how well κ was tuned. For instance, in the last grid experiment, one could
set κ = 1 and turn off the contribution of the centralities. In fact, by letting both terms
equally contribute, we can understand the behaviour better. So, by isolating certain
properties of the point-sets, we now have a strong indication of when the centralities can
perform better or worse than CPD.
72
5.5 Conclusions
In this chapter, we have presented a centrality-based approach coupled with the CPD
algorithm. We have shown how the probabilistic approach was derived, the types of
experiments conducted and the results obtained. In the data we analyzed, many points
had the same centrality, especially when looking at the grid sampling. You can refer
to the images of Chapter 4 when we displayed the distribution of centrality per shape.
For instance, all points inside the object in the degree image have the same centrality
value (e.g. 8 for an 8-connected neighborhood). When sampling using the degree, the
distribution will have a high peak in number of neighbors (e.g. 8) and smaller peaks for
the points in the boundary.
Eigenvector and closeness centralities are the most discriminative distributions under
grid sampling, although there are many nodes with similar centrality values, which are
concentric inside the shape. Such property would explain why the eigenvector centrality
and closeness obtained the best results in the grid experiment, although all centralities
suffered from the same convergence problem. Therefore, when performing registration of
grid sampled points, centralities still can produce good results but it is clear that the
impact on the registration should be reduced via a free parameter.
73

CHAPTER 6
The Minimum-Weight
Maximum-Entropy Problem
In the previous chapter, we brought up the importance of the data-graph in the problem
of registration. We saw that many approaches use the Delaunay triangulation as their
data-graph technique [25, 27, 76]. It is not clear, though, if the Delaunay triangulation is
always the best solution for all possible tasks in CV when a data-graph is required. In
this chapter, we focus solely on the problem of creating a data-graph by approximating
the degree centrality of the graph by a uniform distribution.
A common procedure for registration consists of (i) creating the data-graph using
the Delaunay triangulation and (ii) performing the registration using an optimization
procedure ([25, 76, 27]. We applied this triangulation to build the graph in the previous
chapter. The Delaunay triangulation [33] is based on the condition that no other point
should lie inside the circumcircle of any triangle. Alternative methods include the Reeb
graph [98] which builds the graph based on a height function, the Gabriel Graph [44],
which is a subgraph of the Delaunay triangulation, or even the Euclidean Minimum
Spanning Tree (EMST), since it could be employed as a data-graph. Many researchers
propose methods of data-graph construction focusing either on aesthetic aspects [89] or
designing their own criteria, such as the fan-shaped triangulation of Lian and Zhang [72].
We would like to generate a data-graph which would ease the registration process. In-
stead of creating a generic graph (such as the Delaunay) and applying heavy optimization
algorithms to match the two graphs, we would rather spend more energy on designing a
unique data-graph representation that would make the registration process as trivial as
possible. Developing a more exclusive representation for a graph is not a new concept
and it has been explored before by Dickinson et al. [34] where they focus on a class of
graphs with unique representation for the node labels. They develop a representation
ρ of graph G = (V, E , α, β), where α and β stand for functions assigning labels to the
vertices and edges respectively. They can find if two graphs g′ and g′′ are isomorphics by
comparing their representations ρ(g′) and ρ(g′′) in O(N2).
75
The work presented in this chapter differs in nature from the work of Dickinson
et al. [34] since we are not interested in finding isomorphism between two graphs. Many
isomorphisms are still possible between two point-sets when we can decide how we are
going to build the graphs. So, it would still be difficult to generate graphs without
ambiguous solutions. We would like to reduce the amount of ambiguous configurations
until the only remaining solution corresponds to our matching. In essence, we try to
maximize the entropy of the degree centrality of the graph while minimizing the total
weight of the edges. We pose an optimization problem (called the MWME) and we
provide a polynomial time algorithm based on DP. To the best of our knowledge, we are
the first ones to tackle the matching problem in this fashion.
The results of this chapter were published in the proceedings of the Graph-based
Representation in Pattern Recognition in Beijing, 2015 [28]. We have obtained permission
from Springer and Business Media to reprint all content of the paper in this dissertation.
The license is attached in Appendix A.
6.1 Problem Formulation
The question to be asked when we try to create a data-graph that can support the
registration process is how one can define such a metric, or which property the data-graph
would need to have in order to make the registration as easy as possible. Let us examine
a regular graph, i.e. a graph in which all nodes have the same degree. When we try to
match such graphs, we would notice that all nodes could be matched against all the other
nodes, and the registration task would generate many ambiguous solutions. Therefore,
if we succeed to build a graph representation which is exactly opposite of the regular
graph, the registration process would be alleviated. We are seeking to obtain a graph
whose degree distribution is as diverse as possible.
The Shannon entropy (H) measures the uncertainty associated with a random variable.
We could use the entropy to assess how diverse the degree distribution D{X} of a graph
X is. We can calculate the entropy of the degree distribution as follows:
H(X ) = −
∑
v∈D{X} 6=
p(v) log2(p(v)). (6.1)
where p(v) is the probability of finding a node with a degree of v among all distinct
degree values (D{X} 6= stands for the distinct degree values of graph X ). A graph X with
high entropy according to Eq. 6.1 would indicate that the distribution has a variability
of degree values. The converse is also true, a low entropy means low variability, as in a
k-regular graph in which the probability of finding a node with a degree k is p(k) = 1
and, therefore, log(1) = 0. This also includes the graph without edges.
A graph with a high entropy would let us match nodes more easily. In fact, if all
nodes had different degree values, the matching process would be determined as soon
as the graph is built since matching nodes must have the same degree. Nevertheless,
even if we are able to obtain a graph with the higheset entropy, there is still the problem
of ambiguity since our input is a point-set, we can always exchange the order of points
76
a b
c
de
f
a b
c
de
f
a b
c
de
f
a b
c
de
f
H(X6) = 0 H(L6) = 0.91 H(X •6 ) = 2.25 H(X ◦6 ) = 2.25
(a) Complete X6 (b) Delaunay L6 (c) NHDD X •6 (d) NHDD X ◦6
Figure 6.1: Some realizations of a six vertices graph along with their entropy. Graph
(a) is a complete graph, Graph (b) is a triangulation and Graphs (c) and (d) fulfill the
NHDD property with the highest entropy values.
and obtain other graph isomorphims, which is not what we want. Thus, we also need to
generate a graph as unique as possible. We would like to uniquely identify the nodes
unless all points in the graph are equidistant, i.e. many possible solutions still exist. If
this is not the case, we should be able to match the graph. In order to achieve this goal,
we minimize the total weighted edge cost of our graph when we constrain the solution to
have maximum entropy in a graph.
Definition 37 (The Minimum-Weight Maximum-Entropy Problem). The Minimum-
Weight Maximum-Entropy MWME is the problem of estimating an edge-induced subgraph
whose entropy of the degree distribution is maximum and the total edge weight is minimum.
Given a point set X, we create a complete graph X|X| using a metric1 function
(Def. 12) as the edge weights. Let W denote the weighted edges of X|X|. We define a
binary vector U ∈ {0, 1}|W | that induces an edge-subgraph X [U ] composed of all nodes
of X|X| and edges {Wi|Ui = 1}. We search for the vector U which minimizes the cost:
minimize
U
|W |∑
i=1
WiUi,
subject to H(F) ≤ H(X [U ]), ∀F
U ∈ {0, 1}|W |.
(6.2)
under the constraint that the entropy H(X [U ]) of the induced subgraph X [U ] is maximum,
i.e. for any graph F , where |F| = |X |, the entropy H(F) will be lower or equal to our
edge induced subgraph H(X [U ]). The second constraint states that the optimization
variable is discrete, we either add the edge Wi to our graph H(X [U ]) when Ui = 1 or we
do not add such an edge (Ui = 0). We propose a DP algorithm that minimizes Eq. 6.2
by looking deeper into some properties of the desired induced graph H(X [U ]). It is
1e.g. Metric functions were discussed in Chapter 2.
77
important to mention that we cannot guarantee unique solutions. The reason for that
can be visualized in Fig. 6.1a. By constructing a graph out of a regular polygon, we could
rotate all nodes and the total edge cost would remain the same as well as the entropy.
6.2 The Near Homogeneous Degree Distribution
We decompose the optimization problem of Eq. 6.2 into the entropy maximization and
the edge cost minimization steps. The manner we pose the problem enforces that a
solution only belongs to the feasible set iff it has the maximum entropy. In this section,
we explain how we can guarantee the maximum possible entropy for a graph and later
how to incorporate the minimum edge cost in that graph.
Our pursuit for the graph with maximum entropy starts with the question of what is
the maximum number of nodes in a simple graph (Def. 2) with distinct degree values.
This problem is closely related to the graph realization problem [37] which consists of
determining if such a sequence of degree values is feasible for a simple graph (called
graphic sequence). This problem has been addressed by Erdős-Gallai [37] and Havel-
Hakimi [53, 56]. Thus, our problem could be considered as generating the graphic
sequence with highest variability. Theorem 2 states the maximum variability in a graph.
Theorem 2 (Maximum Number of Distinct Degree Values). For every simple graph
X (V, E) with |V| > 2, there are at least two nodes with the same degree.
The proof for Theorem 2 is based on the proof presented by Kocay and Kreher [63].
Proof. A simple graph X (V, E) does neither allow parallel edges nor self-loops. Therefore,
the highest degree is |V| − 1. If we create a degree sequence in which all nodes have
different degree values (the highest possible variability), this would mean that the nodes
would have a degree distribution D{X} = (0, 1, . . . , |V| − 1). If one node has a degree of
|V| − 1, it means it is connected to all other nodes, but the degree zero means that there
is a node not connected to any other. Thus, degree values of 0 and |V| − 1 are mutually
exclusive and cannot coexist in the same graph. As there are only |V| − 1 values for |V|
nodes, by the pigeon hole principle, at least two nodes must have the same degree.
Theorem 2 establishes an important result for us, since we know that we cannot
create a graph whose nodes have all distinctive degree values, the highest possible number
of different node degrees is |V| − 1 (Def. 38).
Definition 38 (Near Homogeneous Degree Distribution). A simple graph X (V, E) fulfills
the NHDD property when it contains |V| − 1 nodes with different degree values.
When we analyze the constraint on our object function, we can state that the solution
is only feasible if the induced subgraph fulfills the NHDD property. We show that there
are two feasible graphs that fulfill NHDD.
78
6.2.1 Graphs fulfilling NHDD
Definition 39 (Types of NHDD Graphs). X •N is a connected graph which fulfills the
NHDD property and X ◦N is a graph which fulfills the NHDD but it contains one isolated
node.
In Lemma 2, we show that the complement of a graph which fulfills the NHDD
property also obeys the NHDD.
Lemma 2 (The complement graph fulfills the NHDD condition). The complement graph
of X •N also fulfills the NHDD property.
Proof. The complement graph X ◦N = X •N is obtained by taking the difference of XN −X •N .
A X •N has distribution equal to D{X •N} = (1, . . . , |V| − 1). A complete graph XN has all
nodes with degree equal to |V| − 1. Therefore, the degree distribution of XN −X •N will
be D{X ◦N} = (|V| − 2, . . . , 0) with one isolated node and maximum degree of |V| − 2.
Lemma 3 and Algorithm 6.1 show how we can generate the two possible graphs (X •N
and X ◦N ) described in Def. 39, and more importantly, establishes a relation between the
two graphs.
Lemma 3 (Construction of graphs fulling NHDD). For any integer N ≥ 3, it is possible
to generate both X •N and X ◦N .
Proof. We start with a base case of a graph X ◦N with N = 3, whose degree distribution
(the degree values of all nodes in the graph) is D{X ◦N} = (0, 1, 1). The NHDD is already
fulfilled. Our inductive step states that this holds up to any number N ≥ 3. In order to
build the solution for N + 1, there are two possibilities:
a) X ◦N has an isolated node. Then, by adding a node and connecting it to all the
others, we will obtain a X •N+1 since the degree of all the other nodes will be increased
by one and the graph will be connected. The graph with one disconnected node can be
obtained using the complement graph: X ◦N+1 = XN+1 −X •N+1.
b) X •N does not have an isolated node. Therefore, we add a new isolated node and
we obtain X ◦N+1 in which the NHDD is fulfilled. The connected graph can be obtained
using the complement graph: X •N+1 = XN+1 −X ◦N+1.
The nhdd algorithm (Algorithm 6.1) guarantees that our induced subgraph has
maximum entropy. Figure 6.2 shows a visualization of the algorithm for a graph with
six nodes. The graph is colored in blue if it does not fulfill the NHDD condition and in
yellow otherwise. Graphs 6.2b and 6.2g are X ◦3 and X ◦5 , and graphs 6.2c and 6.2l are
79
X •4 and X •6 respectively. In the proof of Lemma 3, we used the complement of a graph
to show that graphs X •N and X ◦N could be built.
Algorithm 6.1: The nhdd algorithm: Generating a graph with maximum entropy.
Data: N ≥ 3;
1 begin
2 X ← {x0,x1};
// We alternate between the two steps
3 for i← 2 to N do
4 if X is connected then
// X ◦N: Add a node, but do not connect to any other.
5 X ← X ∪ {xi};
6 else
// X •N: Add a node, and connect it to all the other nodes.
7 X ← X ∪ {xi|xi ↔ x ; ∀x ∈ X};
8 end
9 end
10 return X .
11 end
Theorem 3 (Connected Graph fulfilling NHDD). For any integer number N , it is always
possible to obtain a connected graph that fulfills the NHDD property.
Proof. This proof comes out naturally as a consequence of Lemmas 3 and 2. For any
integer N , we will obtain either X •N or X ◦N . In case we obtain X ◦N , we take the complement
graph and we can always obtain a connected graph which is NHDD.
Fig. 6.1 shows four different induced graphs with 6 nodes and their respective entropy
values (Eq. 6.1). Graph 6.1a is a complete graph X6, whose entropy is equal to zero.
Graph 6.1b is created using a Delaunay triangulation (L6) and its entropy is equal to
0.91. Graphs 6.1c and 6.1d are a G6 and a X ◦6 respectively. It is clear that both X ◦6 and
X •6 are more distinctive than L6 and X6 and their entropy is significantly higher. The
existence of an isolated node did not affect the entropy as the variability in the number of
existing nodes is the same, however, as the nodes have the same distance, those solutions
are not unique.
Since we have been speaking about the distinctive node degrees which would allow a
one-to-one match between graphs, we can also determine which node has the repeated
degree value (Lemma 4).
Lemma 4 (The repeated node). The repeated node in a graph fulfilling the NHDD has
degree equal to
⌊
N
2
⌋
.
Proof. The proof will be by induction. We start with a base case with N = 4. Hence,
the node degree distribution is D{X •4 } = (1, 2, 2, 3) and the repeated node has degree
80
j) k) l)
g) i)h)
d) f)e)
a) c)b)
a b
c
de
f
a b
c
de
f
a b
cf
a b
cf
a b
c
de
f
a b
c
de
f
a b
c
e
f
a b
c
de
f
a b
cf
a b
cf
b
c
a b
c
Figure 6.2: An algorithm for building a graph which fulfills the NHDD condition. The
graph is colored in blue if it does not fulfill the NHDD condition and in yellow otherwise.
81
equal to 2. Our inductive step is:
D{X •N} =
(
1, . . . ,
⌊
N
2
⌋
,
⌊
N
2
⌋
, . . . , N − 1
)
. (6.3)
By adding one isolated node as described in Lemma 3, we obtain:
D{X ◦N+1} =
(
0, 1, . . . ,
⌊
N
2
⌋
,
⌊
N
2
⌋
, . . . , N − 1
)
(6.4)
The connected graph can be obtained by taking the complement of D{X ◦N+1}:
D{X •N+1} =
(
N, . . . , N −
⌊
N
2
⌋
, N −
⌊
N
2
⌋
, . . . , 1
)
(6.5)
We know that N =
⌊
N
2
⌋
+
⌊
N+1
2
⌋
, therefore, by plugging it back at Eq. 6.5, we obtain:
D{X •N+1} =
(
1, . . . ,
⌊
N + 1
2
⌋
,
⌊
N + 1
2
⌋
, . . . , N
)
(6.6)
The number of edges of X •N and X ◦N can be directly calculated (Theorem 4).
Theorem 4 (The number of edges in a NHDD graph). The number of edges of graphs
X •N and X ◦N is equal to |E•N | = 1
2
(
N(N−1)
2 +
⌊
N
2
⌋)
and |E◦N | = 1
2
(
(N−2)(N−1)
2 +
⌊
N−1
2
⌋)
.
Proof. We first prove it for graph X •N . As proved in Lemma 4, the sum of degrees in a
X •N is equal to:
∑
x∈X •N
d(x) = 1 + . . .+
⌊
N
2
⌋
+
⌊
N
2
⌋
+ . . .+N − 1 (6.7)
which is equivalent to:
∑
x∈X •N
d(x) =
N−1∑
i=1
i+
⌊
N
2
⌋
= N(N − 1)
2 +
⌊
N
2
⌋
(6.8)
The handshake lemma states that |E| =
∑
d(x)/2, we arrive that the number of edges is:
|E•N | =
1
2
(
N(N − 1)
2 +
⌊
N
2
⌋)
(6.9)
By rewriting Equation 6.4, we conclude that:
|E◦N | =
∑
x∈X ◦N
d(x) =
N−2∑
i=0
i+
⌊
N − 1
2
⌋
= 1
2
((N − 2)(N − 1)
2 +
⌊
N − 1
2
⌋)
(6.10)
82
6.3 Optimization
Let us go back to our initial problem in this chapter: the minimization of the objective
function in Eq. 6.2. If we would like to use the NHDD property during our optimization,
we would need to prove that a graph fulfilling such property has the maximum possible
entropy. The uniform distribution has the maximum entropy [129, 61], however, as
already discussed, this cannot be achieved in a graph (Theorem 2). Theorem 5 shows
that a graph which fulfills the NHDD has the maximum entropy.
Theorem 5 (Maximum Entropy in a Graph). A graph fulfilling the NHDD property has
the maximum entropy among all graphs with N nodes.
Proof. Let us suppose that there is a graph X ′ with a higher entropy than a graph
fulfilling the NHDD condition. Our graph has N − 1 distinctive degree values, therefore,
X ′ should have N distinctive values in order to have a higher entropy than ours (the
uniform distribution). Theorem 2 shows that such a graph does not exist, therefore it is
a contradiction, leading that no graph can have higher entropy than the one fulfilling
NHDD.
Theorem 6 calculates the entropy of graphs X •N or X ◦N analytically.
Theorem 6 (The entropy of a NHDD graph). The entropy of H(X •N ) = H(X ◦N ) =
log2N − 2
N .
Proof. In a probabilistic interpretation of the entropy, the p(v) is the probability of an
event v to happen. Our event is the occurrence of a degree v in X •N (or X ◦N ). There
are N − 2 events with probability p(v) = 1/N and one whose probability is p(v) = 2/N.
Therefore, the entropy of H(X •N ) and H(X ◦N ) is:
H(X •N ) = N logN
N
+ 2 logN
N
− 2 logN
N
− 2
N
= logN − 2
N
(6.11)
6.3.1 The n1graph algorithm explained
Fig. 6.3 shows the optimization process over a complete graph X4. The edge weights are
computed using the Euclidean distance between the points. The n1graph algorithm
(Algorithm 6.2) contains three nested loops. Each row of Fig. 6.3 displays one iteration
of the outer i loop. We highlight in blue the current i node. For instance, the first row
corresponds to i = a , the second row corresponds to node b , and so on. We start by
allocating Matrix dp (line 2 of Algorithm 6.2) of dimensions N3 whose cost is initialized
to ∞. This matrix will hold the cost to be minimized.
The second nested loop stands for the capacity k of the current edge-induced subgraph,
which can be visualized by the columns in Fig. 6.3. A node belongs to the induced
subgraph if it is highlighted in orange, the edges which were selected based on the
cost. In the first iteration, node i = a , therefore, this node is the only node in the
83
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a) i=(a), k=2, c = 1
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
a
b
c
d6
3
451
2
d) i=(b), k=2, c = 1
g) i=(c), k=2, c = 2
j) i=(d), k=2, c = 3
b) i=(a), k=3, c = 8
e) i=(b), k=3, c = 8
h) i=(c), k=3, c = 8
k) i=(d), k=3, c = 9
c) i=(a), k=4, c = 11
f) i=(b), k=4, c = 11
i) i=(c), k=4, c = 11
l) i=(d), k=4, c = 10
Figure 6.3: N1graph Optimization. The row represents the reference node (i) which
increases from top to down (blue nodes). The column describes the capacity of the
current graph (increases left to right). The optimization starts always with two nodes,
i.e. the nearest neighbor of the reference node. At every iteration (column) we add one
more node to the graph (the one with minimum cost) since the capacity is increased.
The optimal solution is the minimum cost of the last column. In this example the cost is
10 at i=d and k=4.
84
induced subgraph. We increase the capacity k = 2 and add node b , since it is the one
contributing with the lowest cost (Fig. 6.3a). Then, between the two remaining nodes,
we choose the one which has the minimum weight. If c joins the graph, we add a cost
of 7 but if we add node d instead, the added cost would be 9, therefore, we choose
c (Fig. 6.3b). Finally, we need to add the last remaining node since our graph has to
be connected (Fig. 6.3c). This is one solution during the optimization and it could be
seen as one point in the space of solutions with a certain cost. We produce N solutions
iteratively (line 3) and search for the one with the lowest cost. For instance, solution i
will be least as low as i− 1 (line 16).
Algorithm 6.2: The n1graph algorithm which generates a X •N graph.
Data: Complete graph XN (X,W ), N ≥ 3;
1 begin
2 dp←Matrix(N,N,N, value =∞);
// We create N solutions starting with point i (layer).
3 for i← 1 to N do
4 dp(i, 1, i : N) = 0; join = 0;
// k is the capacity of the graph X •N (row).
5 for k ← 2 to N do
6 added = false; // j is the node to be added into X •N (column).
7 for j ← 1 to N do
8 if join then
9 cnew ← dp(i, k − 1, N) +
∑
g∈G
W (j, g);
10 else
11 cnew ← dp(i, k − 1, N) +W (j,xk−1);
12 end
// If we have not added a node yet, we must accept.
13 if not added then
14 dp(i, k, j)← cnew;
15 added = true;
16 else
17 dp(i, k, j)← min(dp(i, k, j − 1), cnew);
18 end
19 end
20 join = ¬join;
21 end
// The new solution i is at least as good as i− 1.
22 dp(i, n, n)← min(dp(i, n, n), dp(i− 1, n, n));
23 end
24 return traceback(dp).
25 end
Every iteration of i could be visualized as one layer of dp matrix and we set the
85
costs from i to N equal to zero (line 4), this is the reason why the cost is reinitialized
in Fig. 6.3d. The weights lie in the edges, thus, the initial cost is zero since there are
no edges in X •N at capacity 1. We have one node (i) in X •N and we start to increase
the capacity (k) of our graph X •N (line 6) until all nodes belong to the graph. We are
allowed to add one more node (j) whenever the capacity k is increased. We alternate
between two steps, adding an edge to all the nodes currently in X •N (line 8) and adding
an edge to the last node (xk−1) added to X •N (line 10). The algorithm can be visualized
as “pushing” the minimum weight towards the right bottom of the matrix, then the
optimum for each capacity k is set at (k, N) and the optimum for each layer i is found
at dp(i, N,N). Hence, the term dp(i,k − 1, N) can be understood as the best cost for
the optimization when the capacity was k− 1, that’s the total weight we will propagate
towards the end. When the capacity k is increased, we must add a new node into the
graph, therefore, the cost shall increase. We add initially the first node which is not
already in the graph (line 14). Finally, on line 15, we take the minimum between the
cost of adding the current node j and the previous cost at capacity k without node j.
Algorithm 6.2 shows only the core steps to obtain the MWME sollution. Nevertheless,
more involved steps are necessary to ensure the algorithm will behave correctly. For
instance, we need to prevent adding the same node twice into the graph. Every node has
to be added only once. Also more involved steps which are
6.3.2 The traceback algorithm
Table 6.1 highlights the layer of matrix dp in which the optimum solution was found.
The highlighted node d consists of the layer which yielded the optimum solution (i = 4).
On k = 1, dp is initialized with node d = 0 (since at this capacity there are no edges
in the graph). When capacity is increased to 2, the node a is added yielding a cost
of 4, but within the same capacity there is a lower cost (3) if node c is added instead.
The final cost is displayed at cell j = d ,k = 4. We trace back on the same row until
there is a change in cost (i.e. meaning that a node has been added). Whenever a node is
added, we proceed the trace from the row with a lower capacity (k− 1, N) and continue
tracing back until all nodes are added. The node sequence obtained by tracing back is
( b , a , c , d ).
The traceback algorithm (Algorithm 6.3) starts at the right bottom corner (row
N), where the minimum is, and goes back on the same line until the cost is changed. This
change in cost means that a node was added (cell highlighted in orange). The row means
the capacity, when a node at the capacity k was added, we only need to go to row N − 1
and search for a change of cost until we reach the first row. The algorithm produces a
sequence in which the nodes were added, e.g. ( b , a , c , d ) for Table 6.1. This sequence
is used by the nhdd algorithm (Algorithm 6.1) to produce the final graph X •4 .
86
kj a b c d
1 ∞ ∞ ∞ 0
2 4 4 3  ← 3
3 9  ← 9 ← 9 ← 9
4 9 10  ← 10 ← 10
Table 6.1: The table storing the cost during optimization. The trace back starts at (d,4)
and moves on the same row until a value changes: a node is added to the graph.
Algorithm 6.3: The traceback algorithm
Data: Table dp and optimal layer i
1 begin
2 S ← {};
3 k← N ;
// We start at the right bottom corner.
4 while k > 0 do
// If the cost is different, a node has been added.
5 if j = 1 or dp(i,k, j) 6= dp(i,k, j− 1) then
6 S ← S ∪ {j};
7 k← k− 1;
8 j← N ;
9 else
10 j← j− 1;
11 end
12 end
13 return S;
14 end
6.4 Complexity Analysis
In this chapter, we provide three algorithms: nhdd (Algorithm 6.1) that produces a
NHDD graph given an ordered set of nodes, n1graph (Algorithm 6.2) which finds the
optimal solution for our MWME problem, and the traceback (Algorithm 6.3) which
reconstructs the data graph by tracing back the optimum cost from the dp table. In
the actual implementation, one is able to store the solution during the optimization and
therefore, the tracing back step could be avoided. Nevertheless, we discuss the time
complexity associated with all algorithms.
Theorem 7 (The complexity of the nhdd algorithm). The time complexity of the nhdd
algorithm is Θ(N + |E|).
Proof. nhdd has a single loop (line 3) which accounts for the N term, and the cost added
at each iteration is associated with each edge created in the graph (line 7).
87
Theorem 8 (The complexity of the n1graph algorithm.). The time complexity of the
n1graph algorithm is Θ(N3|E|).
Proof. The three nested loops of i, j,k yield clearly a Θ(N3) time. Inside the j loop, two
operations will alternate: (i) adding one edge to the last node of X •N and (ii) adding an
edge to each node of X •N , the total operations will be: (1, 2, 1, 4, 1, . . .), which can be
split into
⌊
N−1
2
⌋
operations of type (ii) and
⌊
N
2
⌋
operations of type (i). The complexity
for those two operations is equivalent to Eq. 6.9, which is the number of edges in the
graph, yielding in total Θ(N3|E|).
Theorem 9 (The complexity of the traceback algorithm). The complexity of the
traceback algorithm is O(N2).
Proof. The traceback algorithm is bounded by the table size which is N2.
6.5 Matching
The matching is performed by bringing each point-set towards this canonical represen-
tation. Given point-sets X1,X2, . . . ,XP , we build the MWME graph for each one of
them. The matching associates a node with degree v ∈ X •k to a node with the same
degree v ∈ X •l . Therefore, the matching is performed via this canonical representation.
As each point-set is optimized individually, the scale of one does not affect the scale
of the other. The optimization tries to minimize the distances within each point-set
individually, therefore, it is able to handle the following two scenarios: rigid and similarity
transformation (Chapter 3).
We took images from the Caltech-256 dataset which contained a single object in
the scene. We sampled N = 50 points from the silhouette of the object. A random
similarity transformation was applied to the point-sets. Figure 6.4 shows an example of
the matching. Not all edges were displayed to avoid cluttering the scene, but the sets are
correctly matched.
Figure 6.4: Registration of point-sets under Similarity Transformation: translation +
rotation + isotropic scaling.
88
6.6 Conclusions
In this chapter, we proposed a novel way to match point-sets via a canonical representation
in which we minimize the MWME problem. Our approach is based on a Dynamic
Programming algorithm yielding a cubic complexity solution. Such a graph turns
a registration procedure into a trivial task under, for instance, rigid and similarity
transformation since there is a direct mapping between nodes.
When the point-sets are not related by a similarity transformation, this method
cannot compute the direct matching. Also, due to global optimization, the location of
every individual point has a major impact on the cost of the edge-induced subgraph, it
becomes sensitive to variations in the local neighborhood of the points. We will cover
some of those issues in the next chapter, but as a future work in this topic, we will extend
the algorithm to a more local optimization not to be dependent on all nodes, but we will
try to match regions of maximal entropy, thus, considering only a subset of (i.e. subset
of nodes). This will make allow more flexibility during registration and it will make the
process less dependent on this global optimization since we could perform the registration
of non-linearly related point-sets by performing piece-wise matchings of a subset of nodes.
It will also to be able cope with noise and partially overlapping regions since it is will no
longer optimize over the whole graph.
89

CHAPTER 7
The Minimum Spanning Tree of
Maximum Entropy
In the previous chapter, we started dealing with the problem of data-graph construction,
which is the process of generating graphs out of unstructured point-sets. This type of
graph does not offer topological information from the edges, since the edges we find in
data-graphs do not represent actual connections between the original points. They are
built according to a certain criteria such as geometrical or even aesthetical ones [89]. The
choice of edges is highly dependant on the type of data-graph technique. For instance,
the Delaunay triangulation is based on the condition that no other point should lie inside
the circumcircle of any triangle.
One possibility for registering point-sets is to formulate the problem as Graph
Matching (GM). As we have the freedom to design the data-graph, we are not restricted
to use a triangulation. In fact, we could design a graph that would decrease the search
space when matching points, i.e. a graph that would be tuned towards a specific
application instead of acting as a general stand-alone data-graph construction technique.
The Iterative Closest Point (ICP) algorithm was introduced in Chapter 3. As already
mentioned, we would need to find for each point in one set the closest point in the other
set, requiring us to check the whole set of points to determine which point is the closest
one. Nevertheless, if we have some prior knowledge about the desired node, we could
decrease this search space by only checking points which match a certain criteria (such
as the degree of the node). The number of points we need to check would be greatly
reduced in this fashion if this information is reliable.
In this chapter, we modify the cost function presented in Chapter 6 to build a data-
graph which can cope with noise in the local neighborhood of points, a more stable graph
tuned for registration. We call the problem as the Minimum Spanning Tree of Maximum
Entropy (MSTME). By minimizing the total edge weight, we can allow some robustness
of the data-graph with respect to small deviations of the points. By maximizing the
entropy, we can generate a data-graph which could later ease the registration process.
91
We evaluate the current approach under different levels of noise and compare the stability
of the data-graph with a Delaunay triangulation in several databases.
This chapter is available at arxiv.org, report number 1505.06319 [29]. It has won the
best paper award at the 39th Annual Workshop of the Austrian Association for Pattern
Recognition (ÖAGM) in Salzburg, 2015, sponsored by the Austrian Computer Society.
7.1 Objective Function
We pose the problem as an optimization procedure in which our cost function is built
upon a MST formulation and entropy maximization. In this sense, our work is closely
related to [84] as the authors also formulate a Voronoi problem focusing on the entropy of
a MST. However, they are dealing a different problem by proposing an entropy estimator
for clustering. In our work, we are interested on the data-graph construction problem
and our entropy is calculated on the degree distribution of the generated MST while they
estimate the data entropy based on the length of the spanning tree. The problem we
formulate in this section could also be seen as a bi-criteria optimization problem. Ravi
and Goemans [97] proposed a Lagrangian relaxation to solve the constrained minimum
spanning tree problem, which is also a bi-criteria optimization proven to be NP-Hard
by Aggarwal et al. [3]. Neighborhood search and adjacency search heuristics for the
bicriterion minimum spanning tree problem were proposed by Andersen et al. [4].
Given a point-set X, we calculate the distance (Section 2.2.2) from each point to
all the other points and store those distances in W . This generates a complete graph
XN (X,W ). We seek to find a vector U ∈ {0, 1}|W | in such a way that when Ui = 1, we
add Wi to our graph. We can formulate our cost function simply as:
minimize
U
|W |∑
i=1
WiUi − λ(H(X [U ]))
subject to U ∈ {0, 1}|W |,∑
U = |X | − 1,
X [U ] is connected.
(7.1)
where X [U ] is an edge-induced subgraph from vector U . λ is a parameter which weighs
the importance between the two terms: minimizing the total edge and maximizing the
graph entropy. The first constraint states that we have a discrete problem, i.e. we
either add an edge (Ui = 1) or we do not add it. The second constraint is a necessary
condition for our graph to be a tree. We need the third constraint to force the graph to
be connected.
Figure 7.1 displays three different graphs of the same point-set X = {a, b, c, d, e}. The
first graph (a) is simply a MST whose cost is c(MST, λ = 0) = 4. It could be obtained
by setting λ = 0 and therefore the entropy part vanishes from Equation 7.1. The second
graph (b) considers λ = 1, a higher entropy would be able to decrease the cost of solution
in comparison with the MST, the cost is c(MSTME,λ = 1) = 3.04. Notice that although
92
the cost of the weighted edges is higher, the total cost was decreased due to the higher
entropy obtained. Finally, we show in contrast that calculating a Delaunay triangulation
yields an entropy as low as the MST one.
1
1
1
1
1
1
1
a) MST b) MSTME ( = 1) c) Delaunay
 H(MST) = 0.72 H(MSTME) = 1.37 H(Delaunay) = 0.72
1
1
1
1
Figure 7.1: The first graph (a) is obtained with a MST and has cost c(MST) = 4 and
H(MST) = 0.72. Graph (b) is obtained using our MSTME. Since H(MSTME) = 1.37,
the cost c(MSTME,λ = 1) = 3.04, hence, c(MST) > c(MSTME). The last graph (c) is a
Delaunay triangulation and it has the same entropy as the MST.
7.2 Optimization
Our optimization will be based on a greedy approach evaluating the best cost which
could be obtained when adding an edge. We have the constrain that our algorithm has
to generate a tree, therefore, it cannot contain cycles. At each iteration, we are allowed
to add one more edge until we reach a tree composed of all nodes. We add the node
whose cost is minimum regarding our cost function measuring the weight of the edge and
the entropy of the new generated MST.
The input is the complete graph XN (X,W ) composed of the point-set as the vertices
and the edges are weighted according to the Euclidean distance (Section 2.2.2). For each
edge e, we check if the subgraph X [U ] induced by our vector U would contain a cycle by
the addition of such an edge, in which case, we discard (line 8). Otherwise, we calculate
the new cost obtained by the weight of e and we estimate the entropy of the MST (line
13). In line 16, we remove this edge from our graph and continuing searching within the
same capacity (i.e. the number of edges allowed at the moment) for an edge which could
yield a smaller cost. At the end, we add the edge with the best cost and increase the
capacity of our MST by one until it reaches |X| − 1 (line 18).
This algorithm has a greedy strategy, which means, it will make decisions locally,
based on the best solution at the moment. Advantages of greedy approaches consist
of simpler algorithms with intuitive approach, but it can only converge towards local
93
minima and cannot guarantee the optimal solution.
Algorithm 7.1: The mstme algorithm.
Data: XN (X,W ), λ;
Result: X [U ];
1 begin
2 U← {};
// We can only choose |X| − 1 edges
3 for i = 0 to |X| − 1 do
4 cost←∞;
5 best← 0;
6 foreach e ∈ E \ U do
7 if G[U ∪ {e}] has a cycle then
8 continue;
9 else
10 U ← U ∪ {e};
11 end
12 if We − λH(X [U ]) < cost then
13 cost←We − λH(X [U ]);
14 best← e;
15 end
16 U ← U − {e};
17 end
18 U ← U ∪ {best};
19 end
20 return X [U ];
21 end
7.3 Experiments
Our experimental section focus on evaluating the behavior of our algorithm on a point-set
sampled from the silhouette of an object. Figure 7.2 shows some point-sets connected
using the MSTME. It is noticeable that in thin areas, the entropy term of the optimization
function takes over, i.e. the degree values of the points which lie in sich areas are higher
than in other regions. We can exemplify this by looking at the top of the fish or the feet
of the dog, while other regions follow the silhouette of the object having lower degree
value. It is too costly to make an edge cross the shape of the object, since we would have
to add the weight of the edge to our cost. Therefore, we can up to some degree predict
how the degree of the nodes behave based on their local neighborhood.
Our noise experiment is built upon the shortest pairwise distance in the point-set as
ε. Then, we perturb each point x ∈ X by a length and an angle (l, α), where α = [0, 360),
and the radius l = [0, rε] and r varies from [1, 10], e.g. 5ε means that each point of the
dataset was perturbed within a radius of 5 times the shortest pairwise distance in the
94
012345
6
7
8
9
1011
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
3940
41
42
43
44 45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
6162
63
64
65
6667
68697071
72
7374
7576777879
80
81
8283
848586
87
88
89 90
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
3940
41
42
43
44
45
4647
48
49
50
51
52
53
54
55
56
5758
59
60
61
62
63
64
65
66
6768
697071
72
73
74
7576
77
78
0
1
2
3
4
5
6
7
8
9
10 11
12
13
14
15
16
17
18
19
20
21 22 23
24
25
26
27
28
29
30
31
32
33
0
123
45 67 891
01
1 1
2 1
3 141
516 1
7 1
8
1
9
2
021222
3 2
42
52
62
72
8
2
9
3
031
3
2
3
3343
53
6 3
73
8 3
9
4
041
4
2
4
3
4
44
5 4
6
4
7
4
8
4
9 5
0
5
152
5
35
45
5 5
6
5
7 5
8
5
9
6
0 6
16
2
6
36
4 6
56
6676
8 6
97
07
1
7
27
3 7
4
7
5 7
6
7
7
7
8
7
98
0
8
1 8
2 8
3
8
48
58
6 8
7
8
8 8
9
9
0 9
19
2 9
39
4
9
5
9
6 9
7
9
899
1
0
0
1
0
1 1
0
2
1
0
3
1
0
4
1
0
5
1
0
6
1
0
7
1
0
8
1
0
9
1
1
0
1
1
1
1
1
21
1
3
1
1
4
Figure 7.2: Data graphs connected using the MSTME (λ = 0.5).
graph. Certainly, the higher the noise, the lower the stability of the algorithms will be.
012
3
4
56 7
891
0 1
1
1
21
3
1
4
1
5
1
6
1
7 18
1
92
0
2
12
2
2
32
4
2
5
2
6 2
72
8
2
9
3
0 3
1323
3
3
43
5
3
6
3
7 3
8
3
9
4
0
4
1 4
2434
4
4
54
6 4
7484
9 5
0 5
15
2
5
3
5
4
5
5 565
7 5
8
5
9
6
0
6
1 6
2 6
364
6
5
6
66
7 6
8
6
9
7
0
7
1
7
2
7
37
47
5
7
67
77
8 7
9 8
08
1
8
28
3 8
4
8
5 8
6
8
7 8
88
99
0
9
192
9
39
4
9
5 9
6
9
7
9
89
9
1
0
0
1
0
1
1
0
2
1
0
3
1
0
4
1
0
510
6
1
0
7
1
0
8
1
0
9
1
1
0 1
1
1
1
1
2
1
1
3 1
1
4
1
1
5
1
1
6
1
1
7
01
2
34
56 7
8
9
1
0 1
11
2
1
31
4
1
5
1
61
7
1
8 1
92
0 21
2
22324
2
52
6 2
7
2
82
9
3
0 3
1 32
3
3
3
4
3
53
6
3
7 3
8
3
9
4
0
4
1 4
2
4
34
4 4
5
4
6
4
7
4
84
9
5
0
5
15
2 5
3
5
4
5
55
6
5
7 5
8
5
9 6
0
6
1 6
2
6
3
6
4 6
5
6
66
7 6
86
9
7
0 7
1 7
2
7
37
47
5
7
67
77
8 7
9
8
0
8
18
2
8
3
8
4
8
5 8
6
8
7
8
8
8
99
09
1 929
39
4
9
5
9
6
9
7
9
8
9
91
0
0
1
0
1 10
2
1
0
3 1
0
4
1
0
5
1
0
6
1
0
7
1
0
8 1
0
9
1
1
0 1
1
1
1
1
2
1
1
3 1
1
4
1
1
5
1
1
6
1
1
7
01
2 3
4
5
6
78
91
0 1
1
1
2
1
3
1
4
1
5
1
61
71
8 1
92
0
2
12
2
2
3
2
4
2
5
2
6 2
72
82
93
0 3
1
3
2
3
3
3
4
3
5
3
6
3
7
3
8
3
9 4
04
1
4
2 4
34
4
4
54
6 4
7 48
4
9
5
0
5
1
5
2
5
3
5
4
5
5 5
6
5
7
5
85
9
6
0
6
1 6
2
6
3
6
4
6
56
6
6
7
6
8
6
9
7
0 7
1 7
2
7
3
7
4
7
5 7
67
7
7
8 7
9
8
0
8
1
8
2
8
3 8
48
5 8
6
8
7
8
88
99
0
9
1
9
2
9
39
49
5
9
6
9
79
8
9
91
0
0
1
0
1
1
0
2
1
0
3
1
0
41
0
5
1
0
6 1
0
7
1
0
8
1
0
9
1
1
0
1
1
1
1
1
2
1
1
3 1
1
4
1
1
511
6
1
1
7
0
1
2
3
4
5
6 7
8
91
0 1
11
213
1
415
1
6
1
7
1
8
1
92
0
2
1 2
2
2
32
42
52
6
2
72
8
2
9
3
0
3
1
3
2
3
3
3
4
3
5
3
6
3
7
3
83
9
4
0
4
1 4
24
34
4
4
54
6
4
7
4
84
9
5
0
5
1
5
2
5
3
5
4
5
5
5
65
7 5
8
5
9
6
0
6
1
6
2
6
3
6
4 6
5
6
6
6
7
6
86
97
0 7
1
7
2
7
3
7
4
7
5 7
67
77
8 7
9
8
0
8
1
8
2
8
3
8
4
8
5 8
6
8
7
8
8
8
9
9
0
9
1
9
2
9
3
9
4
9
5 9
69
7
9
8
9
9
1
0
0
1
0
1
1
0
2
1
0
3
1
0
41
0
5
1
0
6 1
0
7
1
0
8
1
0
9
1
1
0
1
1
11
1
2
1
1
3 1
1
411
511
6
1
1
7
Noise up to 3ε Noise up to 5ε Noise up to 8ε Noise up to 10ε
Figure 7.3: Noise added to the point-set.
Figure 7.3 shows the impact of noise in the point-sets varying according to ε in a
cow silhouette. It would not be an easy task even for humans to match such points,
notice how abrupt the change in thin areas (e.g. feet). It is worth mentioning that as the
algorithm searches for the optimal, it has to decide on which edges to take, therefore, a
simple wrong edge will cause a much higher impact on the results when compared with
Delaunay triangulation, which has many more edges.
As we are not allowed to have cycles, the silhouette of the object will “break” in
locations where the weights of the edges are high. This location will vary according to
the noise. Notice that the split occurred in the face of the cow when noise level was 10ε
but in level 8ε, it happened at a different location. Figure 7.4 shows the quantitative
analysis of stability between MSTME and Delaunay triangulation. The way we evaluate
the stability is by calculating the percentage of edges which remained stable in all graphs
within the same noise level. The first graph shows the results for the fish dataset1. The
y-axis displays the percentage of stable edges, e.g. 0.8 means that 80% of edges were
found across all data-graphs within the same level. For this dataset, our algorithm for
the MSTME was always more stable than the Delaunay triangulation up to the highest
1The fish dataset was obtained from the CPD [86].
95
0.6
0.7
0.8
0.9
1.0
Delaunay MSTME
Noise X01
Delaunay MSTME
Noise X02
Delaunay MSTME
Noise X03
Delaunay MSTME
Noise X04
Delaunay MSTME
Noise X05
Delaunay MSTME
Noise X06
Delaunay MSTME
Noise X07
Delaunay MSTME
Noise X08
Delaunay MSTME
Noise X09
Delaunay MSTME
Noise X10
Fish - MSTME vs Delaunay
Amount of Noise
P
er
ce
nt
ag
e 
of
 S
ta
bl
e 
E
dg
es
Amount of Noise
P
er
ce
nt
ag
e 
of
 S
ta
bl
e 
E
dg
es
0.4
0.6
0.8
1.0
Delaunay MSTME
Noise 01
Delaunay MSTME
Noise 02
Delaunay MSTME
Noise 03
Delaunay MSTME
Noise 04
Delaunay MSTME
Noise 05
Delaunay MSTME
Noise 06
Delaunay MSTME
Noise 07
Delaunay MSTME
Noise 08
Delaunay MSTME
Noise 09
Delaunay MSTME
Noise 10
Cow1 - MSTME vs Delaunay
Figure 7.4: Boxplot comparison of the stability of the fish dataset. The noise consists
of shifting each individual point by ε×level, varing from 1 to 10. For each noise level,
we generated 30 perturbed datasets and compute the number of stable edges accross all
results.
noise level. The boxplot displays several other types of information, such as the median,
quartiles, etc. The instability observed was found on the thin parts of the fish, such as
the dorsal and caudal fins of the fish. The second plot shows the results on the cow1
dataset2
We observed that the MSTME was, in general, more stable than the Delaunay
algorithm. Small variations in the location of the nodes affected the triangulation more
than it affected the MSTME. As the triangulation generates many more edges, a single
wrong edge will have a high impact on the results of the MSTME. Nevertheless, it was
still able to obtain higher stability than Delaunay up to a certain level of noise. Figure 7.5
shows examples of the dog1 point-set perturbed under noise level 3ε. It is easy to spot
the differences between the data-graphs generated by MSTME samples, although it is
hard to visualize the differents of the triangulation, we notice that the edges between the
2the cow1 dataset was obtained from the kimia99 dataset [112].
96
head and the tail of the dog vary on all data-graphs. It is also noticeable a distinction
around the head and between the legs of the dog.
0
123
45
67 8
91
01
1
1
2 1
3 141
5 16 1
7
1
81
9
2
0
2
1 22
2
3
2
4
2
52
62
72
82
93031
3
2
3
33
43
53
6
3
73
8 3
94
04
1 42
4
3
4
44
5 4
6
4
7
4
84
9 5
0
5
1
5
25
3
5
4
5
5 5
65
7 5
8
5
96
0
6
16
2
6
36
4
6
5
6
6 676
8 6
97
07
1
7
27
3 7
4
7
5 7
677
7
8
7
98
0
8
1
8
2 8
38
4
8
5
8
6 8
78
8 8
9
9
0 9
1
9
2 9
39
4
9
5
9
6 9
79
8
9
9
1
0
0 1
0
1 1
0
2
1
0
3
1
0
4
1
0
5 1
0
6
1
0
7
1
0
8
1
0
9
1
1
0
1
1
1
1
1
2
1
1
31
1
4
0
12 3
45 67 8
91
011 1
2 1
3
1
41
5 1
6 1
7 1
8
1
9
2
0 2
12
2
2
3 2
42
52
62
7282
930
3
132
3
33
43
5
3
6 3
73
8 3
94
04
1
4
2
4
3
4
44
5 4
6
4
7
4
84
9
5
05
1 525
3
5
45
5 5
6
5
7 5
85
9
6
0 6
1
6
26
36
4 6
56
6
6
7
6
8 6
9
7
0
7
17
27
3 7
4
7
5 7
6777
8
7
98
0
8
1 8
2 8
38
4
8
5
8
6 8
7
8
8 8
9
9
0 9
1
9
2 9
3
9
49
59
6
9
79
8
9
9
1
0
0
1
0
1 1
0
2
1
0
3
1
0
41
0
5
1
0
6
1
0
7 1
0
8
1
0
9
1
1
0
1
1
1
1
1
21
1
3
1
1
4
0
123
45 67 891
01
1 1
2 1
3 1
41
5 16 1
7 1
81
9
2
0 2
1
2
22
3
2
42
52
62
728
2
93
03
1
3
233
3
43
53
6 3
73
8
3
94
0 41
4
2
4
3
4
44
5 4
64
7
4
84
9 5
0
5
1525
35
45
5
5
65
7 5
8
5
960 6
16
2
6
36
4 6
56
6 676
8 6
97
0
7
1
7
2
7
3
7
47
5
7
6
7
7
7
87
9
8
08
1
8
2
8
3
8
48
5
8
6 8
7
8
8 8
9
9
0 9
19
2 9
3
9
4
9
59
6
9
79
8
9
9
1
0
0
1
0
1 1
0
2
1
0
3 1
0
4
1
0
5
1
0
6
1
0
7
1
0
8
1
0
9
1
1
0
1
1
1
1
1
2
1
1
31
1
4
0
12 3
45 6 7 8
91
011 1
2 1
3 1
41
5
1
6 1
7 18
1
9
2
02
1
2
22
3
2
42
52
62
7
2
82
9
3
031
3
23
334
3
5
3
6
3
73
8 3
9
4
0 41
4
2
4
3
4
44
5 4
64
7
4
84
9 5
0515
2
5
3
5
4
5
5 5
6
5
7 5
859
6
0
6
16
26
3
6
4
6
5
6
66
76
8 6
97
07
1
7
2
7
3
7
47
5 7
67
7
7
87
9
8
08
1 8
2 8
3
8
48
5
8
6 8
78
8 8
9
9
0 9
1
9
2 9
3
9
495
9
6 9
79
8
9
9
1
0
0 1
0
1 1
0
2
1
0
3
1
0
4
1
0
5 1
0
6
1
0
7 1
0
8
1
0
9
1
1
0
1
1
1
1
1
21
1
31
1
4
0
12 3
45 67 891
011 1
2 1
3 1
41
5
1
6 1
7 1
81
9
2
021222
3 2
4
2
52
62
72
82
9303
1
3
2333
4353
6
3
73
8 3
9
4
0 4
1
4
2
4
3
4
44
5 4
6
4
7
4
84
9 5
0
5
1 52
5
35
4
5
5
5
65
7 5
8
5
96
0
6
16
2
6
36
4 6
56
6 6
7
6
8 6
97
0
7
17
2
7
3 7
4
7
5 7
6
7
7
7
87
9
8
08
1 8
2
8
3
8
4
8
58
6 8
7
8
8 8
9
9
0 9
1
9
2 9
39
495
9
6 9
79
8
9
9
1
0
0
1
0
1 1
0
2
1
0
3
1
0
4
1
0
5
1
0
6
1
0
7 1
0
8
1
0
9
1
1
0
1
1
1
1
1
21
1
3
1
1
4
0
123
45 67 891
01
1 1
2 1
3 1
41
5 16 1
7 1
8
1
9
2
0
2
12
2
2
3 2
42
52
62
72
8
2
9303
1
3
2
3
33
4
3
5
3
6 3
73
8 3
9
4
041
4
2
4
3
4
44
5 4
6
4
74
8
4
9 5
0
5
1 5
2
5
3
5
45
5 5
65
7 5
859
6
0 6
16
26
36
4 6
56
6
6
7
6
8 6
9
7
07
1
7
2
7
3 7
47
5 7
6
7
77
87
9
8
08
1
8
2 8
3
8
48
5
8
6 8
7
8
8 8
9
9
0 9
1
9
2 9
39
4
9
5
9
6 9
7
9
899
1
0
0
1
0
1 1
0
2
1
0
3 1
0
4
1
0
5
1
0
6
1
0
7
1
0
8 10
9
1
1
0
1
1
1
1
1
2
1
1
3
1
1
4
0
123
45 67 8
91
01
1 1
2 1
3 1
41
516 1
7 1
8
1
9
2
0
2
12
2
2
3 2
42
52
62
7
2
82
930 31
3
23
3343
53
6 3
73
8 3
94
0 41
4
2
4
34
44
5 4
6
4
74
84
9 5
0
5
1 525
3
5
4
5
5 5
65
7
5
8
5
96
0 6
16
2
6
3
6
4 6
56
6 6
76
8
6
97
0
7
1
7
27
3 7
47
5
7
67
7
7
87
9
8
08
1 8
2 8
38
48
5
8
6 8
7
8
8 8
9
9
0 9
1
9
2 9
3
9
4
9
5
9
6 9
7
9
899
1
0
0 1
0
1
1
0
2
1
0
3
1
0
4
1
0
5
1
0
6
1
0
7
1
0
8 1
0
9
1
1
0
1
1
1
1
1
21
1
3
1
1
4
0
123
4
5
67 891
0
1
1 1
2 1
3 1
41
5 1
6 1
7 181
9
2
02122
2
3 2
42
52
62
72
8
2
9303
13
2
3
334
3
53
6 3
73
8 3
94
0 41
4
2
4
3
4
44
5 4
64
7
4
8
4
9 5
05
1 52
5
35
45
5 5
6
5
7 5
8
5
960 6
16
26
36
4 6
56
6 676
8 6
97
07
1
7
2
7
3
7
47
5 7
6
7
7
7
8
7
98
08
1 8
2
8
38
4
8
58
6
8
78
8 8
9
9
0 9
1
9
2 9
39
4
9
5
9
6 9
79
8
9
9
1
0
0
1
0
1 1
0
2
1
0
3
1
0
4
1
0
5
1
0
6
1
0
7 1
0
8
1
0
9
1
1
0
1
1
1
1
1
2
1
1
3
1
1
4
0
123
45
67 8
91
01
1
1
2 1
3 141
5 16 1
7
1
81
9
2
0
2
1 22
2
3
2
4
2
52
62
72
82
93031
3
2
3
33
43
53
6
3
73
8 3
94
04
1 42
4
3
4
44
5 4
6
4
7
4
84
9 5
0
5
1
5
25
3
5
4
5
5 5
65
7 5
8
5
96
0
6
16
2
6
36
4
6
5
6
6 676
8 6
97
07
1
7
27
3 7
4
7
5 7
677
7
8
7
98
0
8
1
8
2 8
38
4
8
5
8
6 8
78
8 8
9
9
0 9
1
9
2 9
39
4
9
5
9
6 9
79
8
9
9
1
0
0 1
0
1 1
0
2
1
0
3
1
0
4
1
0
5 1
0
6
1
0
7
1
0
8
1
0
9
1
1
0
1
1
1
1
1
2
1
1
31
1
4
0
12 3
45 67 8
91
011 1
2 1
3
1
41
5 1
6 1
7 1
8
1
9
2
0 2
12
2
2
3 2
42
52
62
7282
930
3
132
3
33
43
5
3
6 3
73
8 3
94
04
1
4
2
4
3
4
44
5 4
6
4
7
4
84
9
5
05
1 525
3
5
45
5 5
6
5
7 5
85
9
6
0 6
1
6
26
36
4 6
56
6
6
7
6
8 6
9
7
0
7
17
27
3 7
4
7
5 7
6777
8
7
98
0
8
1 8
2 8
38
4
8
5
8
6 8
7
8
8 8
9
9
0 9
1
9
2 9
3
9
49
59
6
9
79
8
9
9
1
0
0
1
0
1 1
0
2
1
0
3
1
0
41
0
5
1
0
6
1
0
7 1
0
8
1
0
9
1
1
0
1
1
1
1
1
21
1
3
1
1
4
0
123
45 67 891
01
1 1
2 1
3 1
41
5 16 1
7 1
81
9
2
0 2
1
2
22
3
2
42
52
62
728
2
93
03
1
3
233
3
43
53
6 3
73
8
3
94
0 41
4
2
4
3
4
44
5 4
64
7
4
84
9 5
0
5
1525
35
45
5
5
65
7 5
8
5
960 6
16
2
6
36
4 6
56
6 676
8 6
97
0
7
1
7
2
7
3
7
47
5
7
6
7
7
7
87
9
8
08
1
8
2
8
3
8
48
5
8
6 8
7
8
8 8
9
9
0 9
19
2 9
3
9
4
9
59
6
9
79
8
9
9
1
0
0
1
0
1 1
0
2
1
0
3 1
0
4
1
0
5
1
0
6
1
0
7
1
0
8
1
0
9
1
1
0
1
1
1
1
1
2
1
1
31
1
4
0
12 3
45 6 7 8
91
011 1
2 1
3 1
41
5
1
6 1
7 18
1
9
2
02
1
2
22
3
2
42
52
62
7
2
82
9
3
031
3
23
334
3
5
3
6
3
73
8 3
9
4
0 41
4
2
4
3
4
44
5 4
64
7
4
84
9 5
0515
2
5
3
5
4
5
5 5
6
5
7 5
859
6
0
6
16
26
3
6
4
6
5
6
66
76
8 6
97
07
1
7
2
7
3
7
47
5 7
67
7
7
87
9
8
08
1 8
2 8
3
8
48
5
8
6 8
78
8 8
9
9
0 9
1
9
2 9
3
9
495
9
6 9
79
8
9
9
1
0
0 1
0
1 1
0
2
1
0
3
1
0
4
1
0
5 1
0
6
1
0
7 1
0
8
1
0
9
1
1
0
1
1
1
1
1
21
1
31
1
4
0
12 3
45 67 891
011 1
2 1
3 1
41
5
1
6 1
7 1
81
9
2
021222
3 2
4
2
52
62
72
82
9303
1
3
2333
4353
6
3
73
8 3
9
4
0 4
1
4
2
4
3
4
44
5 4
6
4
7
4
84
9 5
0
5
1 52
5
35
4
5
5
5
65
7 5
8
5
96
0
6
16
2
6
36
4 6
56
6 6
7
6
8 6
97
0
7
17
2
7
3 7
4
7
5 7
6
7
7
7
87
9
8
08
1 8
2
8
3
8
4
8
58
6 8
7
8
8 8
9
9
0 9
1
9
2 9
39
495
9
6 9
79
8
9
9
1
0
0
1
0
1 1
0
2
1
0
3
1
0
4
1
0
5
1
0
6
1
0
7 1
0
8
1
0
9
1
1
0
1
1
1
1
1
21
1
3
1
1
4
0
123
45 67 891
01
1 1
2 1
3 1
41
5 16 1
7 1
8
1
9
2
0
2
12
2
2
3 2
42
52
62
72
8
2
9303
1
3
2
3
33
4
3
5
3
6 3
73
8 3
9
4
041
4
2
4
3
4
44
5 4
6
4
74
8
4
9 5
0
5
1 5
2
5
3
5
45
5 5
65
7 5
859
6
0 6
16
26
36
4 6
56
6
6
7
6
8 6
9
7
07
1
7
2
7
3 7
47
5 7
6
7
77
87
9
8
08
1
8
2 8
3
8
48
5
8
6 8
7
8
8 8
9
9
0 9
1
9
2 9
39
4
9
5
9
6 9
7
9
899
1
0
0
1
0
1 1
0
2
1
0
3 1
0
4
1
0
5
1
0
6
1
0
7
1
0
8 10
9
1
1
0
1
1
1
1
1
2
1
1
3
1
1
4
0
123
45 67 8
91
01
1 1
2 1
3 1
41
516 1
7 1
8
1
9
2
0
2
12
2
2
3 2
42
52
62
7
2
82
930 31
3
23
3343
53
6 3
73
8 3
94
0 41
4
2
4
34
44
5 4
6
4
74
84
9 5
0
5
1 525
3
5
4
5
5 5
65
7
5
8
5
96
0 6
16
2
6
3
6
4 6
56
6 6
76
8
6
97
0
7
1
7
27
3 7
47
5
7
67
7
7
87
9
8
08
1 8
2 8
38
48
5
8
6 8
7
8
8 8
9
9
0 9
1
9
2 9
3
9
4
9
5
9
6 9
7
9
899
1
0
0 1
0
1
1
0
2
1
0
3
1
0
4
1
0
5
1
0
6
1
0
7
1
0
8 1
0
9
1
1
0
1
1
1
1
1
21
1
3
1
1
4
0
123
4
5
67 891
0
1
1 1
2 1
3 1
41
5 1
6 1
7 181
9
2
02122
2
3 2
42
52
62
72
8
2
9303
13
2
3
334
3
53
6 3
73
8 3
94
0 41
4
2
4
3
4
44
5 4
64
7
4
8
4
9 5
05
1 52
5
35
45
5 5
6
5
7 5
8
5
960 6
16
26
36
4 6
56
6 676
8 6
97
07
1
7
2
7
3
7
47
5 7
6
7
7
7
8
7
98
08
1 8
2
8
38
4
8
58
6
8
78
8 8
9
9
0 9
1
9
2 9
39
4
9
5
9
6 9
79
8
9
9
1
0
0
1
0
1 1
0
2
1
0
3
1
0
4
1
0
5
1
0
6
1
0
7 1
0
8
1
0
9
1
1
0
1
1
1
1
1
2
1
1
3
1
1
4
Figure 7.5: Examples of data-graphs with noise equal to 3ε
7.4 Conclusions
In this chapter, we tackled the problem of robustness of data-graph construction under
local perturbation on the neighborhood of points. We proposed a data-graph technique by
97
searching for a tree whose cost is minimum regarding the total edge weight and maximum
regarding the entropy of the degree distribution. We called this problem the Minimum
Spanning Tree of Maximum Entropy (MSTME). The algorithm differs from the one
from Chapter 6 since in the former we compute the correspondence of two point-sets by
building their data-graph, which could cope only up to a similarity transformation. The
proposed algorithm tries still tries to obtain a more unique data-graph (by maximizing
the entropy) but it also needs to be stable against local variations (by approximating a
MST). The trade-off is controlled by the parameter λ of the algorithm which if could be
cross-validated in a specific scenario whose noise level is known.
In point-sets sampled from the silhouette of an object, we observed that points lying in
thin areas were often unstable when there was noise. The explanation for this is based on
the fact that since those points are close to each other 3, the entropy term of the objective
functions takes over, i.e. we are able to decrease the cost if we maximize the entropy
of that location, whereas maximizing the entropy of points whose pairwise distance is
higher would bring a bigger cost for the optimization. After analyzing and comparing
the robustness of the generated graph, we notice that the data-graph generated by the
MSTME was more stable than the one by the Delaunay triangulation.
The proposed algorithm was based on a greedy strategy which guarantees only con-
vergence towards a local minimum. Even though there was no guarantee of obtaining the
optimum solution, the results were already more stable than the Delaunay triangulation.
In the future, we would like to propose an approximation algorithm which could guarantee
bounds on the MSTME problem, an algorithm which can find the optimum would only
improve the results obtained in this chapter. Finally, the main motivation of this chapter
started from the registration problem. Therefore, we will apply this data-graph in a
framework similar to ICP and evaluate the accuracy of the results obtained as well as
comparing the results using other data-graph techniques such as Delaunay [33], Gabriel
Graph [44], alpha sets [36], etc.
3the pairwise distance is significantly smaller than the average distance in the graph, e.g. 1.5 standard
deviations of the mean.
98
CHAPTER 8
Conclusions
“I am here to change the future of science and eat cake
and I just ran out of cake”
– Unknown
Computer science has evolved since its first arithmetical usages. We have come a long
way on developing smart machines and algorithms capable of learning from data, specially
from images, which has been our main subject of study. Also, we have seen how graph
theory developed in computer science, being part of real life problems: from social
network analysis to computer vision.
We have presented a graph centrality approach to computer vision. We introduced
this topological property for the first time in many vision problems. In this chapter, we
present our conclusions and future directions.
8.1 Summary
We started by providing a literature review on graph theory and centrality, building up
the intuition for the problems which we would tackle and later we discussed each one of
them individually. Chapter 4 provided the first application of centrality as descriptors
in 2D shape matching. We build a representation for the shape using the histograms of
centralities and we train a Naïve Bayes classifier based on that representation. The results
we obtained indicate that in the presence of random noise, the closeness centrality had the
highest classification rates compared to the other centralities. One clear advantage of such
a representation is the rotation invariance achieved by the using graphs. We emphasize
that scale invariance can also be achieved if one normalizes the histograms. We already
discussed though that the histogram representation is rather simple and looses important
aspects of centrality-based representation, i.e. the location of the central nodes. If we
build our system solely based on the histogram, important details of the shape might not
99
be captured. We also showed an alternative solution for the feature representation which
could cope with that problem in Section 4.3.3 which would build random decision forests
that are able to learn which locations in the shape are the most discriminative ones, by
maximizing the information gain. Nevertheless, our experiments using the histograms
allowed us to gather knowledge about the behaviour of each individual centrality.
Chapter 5 tackled the point-set registration problem by introducing the graph cen-
tralities into the CPD framework. We performed experimental evaluation in a different
number of datasets and we observed that for some types of point-sets, the centralities
acted as a good prior and were able to make the algorithm converge in fewer iterations.
We evaluated the algorithm in four different scenarios: (i) point-sets which are spatially
organized as clusters, (ii) points whose locations are randomly distributed, (iii) points
whose locations are Gaussian distributed, (iv) points sample from the silhouette of an
object, and (v) points organized as grids. Among all studied centrality measures, the
closeness one obtained the fastest and most reliable results. The closeness was the fastest
in three scenarios (i and iii, and iv) and the eigenvector centrality the fastest in two
scenarios (ii and v). Both were significantly faster than CPD. Unfortunately not all
centralities obtained good results. The reason for that is due to the fact each centrality
value is defined independently, there is no consensus on which properties a centrality
measure should have. Some of the consider shortest paths to define importance, whereas
other measures consider the number of connections in order to call a node important or
central. For instance, the degree centrality showed high instability and poor results in the
registration. When we analyze the convergence of the algorithm, all centrality measures
had problems with the grid based sampling, not all of them were able to converge.
This is due to the fact that there is a small variability in the number of connections
between points, since they are sampled in a grid, there will be always the same number
of connections for points lying inside the shape, whereas only points in the border will
different number of connections.
We also tackled the registration by approximating the degree distribution of a graph
by a uniform distribution (Chapter 6). Our main motivation for approaching the problem
in this fashion is to perform the matching by bringing the point-set towards a canonical
representation, therefore, the point-sets could be matched directly, the MWME. This
type of registration we proposed is able to cope up to similarity transformation, where
there is a rotation, translation and scale factor. We defined a special class of graphs
who can fulfill the NHDD, which is a graph with N − 1 different node degrees, for N
the number of nodes in the graph. We have proved many properties regarding such
graphs, such as the duality with the complement graph (which also fulfills the NHDD),
the repeated element and the entropy of such a graph.
We constrained the MWME problem to be a tree, leading to the MSTME problem in
Chapter 7. In this formulation, the edge-induced subgraph is actually a tree. This would
let the data-graph to be more robust to small variations in the neighborhood of each point.
We provided a greedy algorithm which can approximate the solution for this problem,
although we can no longer guarantee the unique solution as the unconstrained version. By
solving this problem, we can obtain a data graph which is more robust to local variations.
100
We have evaluated the stability of the approach against the Delaunay triangulation our
algorithm proved to be statistically more stable. Our noise level was based on the shortest
pairwise distance in the point-set ε. We varied the noise up to ±10× ε for each point in
the dataset and performed a significance test (Wilcoxon) comparing the stability of our
approach against Delaunay’s.
The last aspect studied in this dissertation was the usage of graph centrality in the
machine learning process. We discussed about feature modeling and how we could model
different machine learning features using graph centralities. We covered a histogram
based feature which groups the different measures in the graph into a specific number of
bins and those values can be used for training. The other type of feature we discussed
was the random binary comparison within the graph, which would randomly compare
two nodes in the graph and build a decision tree in case one node has a higher centrality
value than the other. We trained classifiers for the different centrality values. The
trained data was obtained by building graphs out of 2D shapes. The evaluation was
performed using the histogram based feature of centrality values and we compared all
centralities measurements under the presence of noise. Our results indicate that the
closeness centrality was the most robust one under presence of noise and the degree
centrality obtained the worst results in those datasets. This reassures us about the
power of centrality representation in vision when we obtained similar conclusions in
different problems. We noticed that closeness outperformed the other ones while the
degree obtained poor results also in the registration problem discussed previously.
8.2 Open Problems and Future Direction
We see there is a potential to develop vision-oriented centrality measures since the graph
we build in our applications has often spatial information associated with the points. On
top of that, graphs generated from images can incorporate other types of information
which could be relevant to the matching despite the spatial coordinates. For instance, we
could design centralities that would incorporate gradient information from the images, or
how frequent the point appears in the image (by evaluating a descriptor of the point).
Aggregating more information from the scene could lead to a better centrality value,
without losing the original meaning of central points in a graph.
Another approach for the design of new centralities could be the combination of
the already existing ones in a fuzzy scenario. These new fuzzy-based centralities could
incorporate the behaviour of standard centrality such as closeness and betweenness but
having a function that would weight the importance of each one of them. Therefore, one
could obtain a centrality contribute to the importance of a node using different measures
and it could cope with the weakness of each individual centrality. In fact, at the end of
Chapter 2, we mentioned the existence of fuzzy centralities. It would be meaningful to
evaluate how fuzzy centralities could be useful in the computer vision scenario.
In Chapter 4, we observed that the eigenvector centrality had a distinguishable
gradient in the shape of the object (Figure 4.2). After adding noise, that gradient
vanished and a smaller gradient appeared whose center was located in the biggest closed
101
region without noise inside the shape. We observed this pattern, although we do not
performed a deeper analysis of why this is happening. We would like in the future to
prove such a statement.
When we discussed about the MSTME, we provided a greedy algorithm for it. We
would like, in the future, to propose an approximation algorithm which could guarantee
bounds on the MSTME problem. In Chapter 5, we discussed about registration. There
are still many opened questions which can be raised and are essential to understand the
remaining results of the registration:
1. What is the data-graph construction technique which can best cope with noise in
point-sets?
2. Is the noise in the triangulation process the only reason for the poor performance
of the centrality in the non-rigid registration?
3. If no noise is present, is the centrality always able to converge faster than CPD for
any point-set?
4. What are the properties of point-sets which can be successfully addressed by the
centralities?
Those questions are going to be addressed in our future work. When we discussed
about the MSTME, we provided a greedy algorithm for it. We would like, in the future,
to propose an approximation algorithm which could guarantee bounds on the MSTME
problem. Also, we would like to integrate the MSTME in our CPD framework and in a
ICP-like framework, in which we would not need to search for all points in order to find
the closest one.
102
APPENDIX A
Licenses for Copyrighted Material
The results of this dissertation have been published in book chapters and journals. We
have obtained permission from Springer and Elsevier to reprint the published material in
this dissertation. We append the granted licenses in this appendix.
1. The first license refers to the contents of the paper de Sousa and Kropatsch [27]
granted by Elsevier to be reused in this dissertation in both printed and online
version.
2. The second license refers to the contents of the paper de Sousa and Kropatsch [28]
granted by Springer Science and Business Media to be reused in this dissertation
in both printed and online version.
3. The second license refers to the contents of the paper de Sousa et al. [31] granted
by Springer Science and Business Media to be reused in this dissertation in both
printed and online version.
4. The third license refers to the contents of the paper de Sousa et al. [32] granted
by Springer Science and Business Media to be reused in this dissertation in both
printed and online version.
103
ELSEVIER LICENSE
TERMS AND CONDITIONS
Apr 16, 2015
This is a License Agreement between Samuel de Sousa ("You") and Elsevier ("Elsevier") provided by Copyright Clearance
Center ("CCC"). The license consists of your order details, the terms and conditions provided by Elsevier, and the payment
terms and conditions.
All payments must be made in full to CCC. For payment instructions, please see information listed at the bottom of
this form.
Supplier Elsevier Limited
The Boulevard,Langford Lane
Kidlington,Oxford,OX5 1GB,UK
Registered Company
Number
1982084
Customer name Samuel de Sousa
Customer address Favoritenstrasse 9/4, PRIP
  Wien, None 1040
License number 3610750687786
License date Apr 16, 2015
Licensed content
publisher
Elsevier
Licensed content
publication
Pattern Recognition
Licensed content title Graph­based point drift: Graph centrality on the registration of point­sets
Licensed content author None
Licensed content date February 2015
Licensed content
volume number
48
Licensed content issue
number
2
Number of pages 12
Start Page 368
End Page 379
Type of Use reuse in a thesis/dissertation
Intended publisher of
new work
other
Portion full article
Format both print and electronic
Are you the author of
this Elsevier article?
Yes
Will you be translating? No
Title of your
thesis/dissertation
A Graph Centrality Approach to Computer Vision
Expected completion
date
Oct 2015
Estimated size (number
of pages)
Elsevier VAT number GB 494 6272 12
Price 0.00 USD
VAT/Local Sales Tax 0.00 USD / 0.00 GBP
Total 0.00 USD
Terms and Conditions
INTRODUCTION
1. The publisher for this copyrighted material is Elsevier.  By clicking "accept" in connection with completing this
licensing transaction, you agree that the following terms and conditions apply to this transaction (along with the Billing
and Payment terms and conditions established by Copyright Clearance Center, Inc. ("CCC"), at the time that you
opened your Rightslink account and that are available at any time at http://myaccount.copyright.com).
GENERAL TERMS
2. Elsevier hereby grants you permission to reproduce the aforementioned material subject to the terms and conditions
indicated.
3. Acknowledgement: If any part of the material to be used (for example, figures) has appeared in our publication with
credit or acknowledgement to another source, permission must also be sought from that source.  If such permission is
not obtained then that material may not be included in your publication/copies. Suitable acknowledgement to the source
must be made, either as a footnote or in a reference list at the end of your publication, as follows:
"Reprinted from Publication title, Vol /edition number, Author(s), Title of article / title of chapter, Pages No., Copyright
(Year), with permission from Elsevier [OR APPLICABLE SOCIETY COPYRIGHT OWNER]." Also Lancet special credit ­
"Reprinted from The Lancet, Vol. number, Author(s), Title of article, Pages No., Copyright (Year), with permission from
Elsevier."
4. Reproduction of this material is confined to the purpose and/or media for which permission is hereby given.
5. Altering/Modifying Material: Not Permitted. However figures and illustrations may be altered/adapted minimally to
serve your work. Any other abbreviations, additions, deletions and/or any other alterations shall be made only with prior
written authorization of Elsevier Ltd. (Please contact Elsevier at permissions@elsevier.com)
6. If the permission fee for the requested use of our material is waived in this instance, please be advised that your
future requests for Elsevier materials may attract a fee.
7. Reservation of Rights: Publisher reserves all rights not specifically granted in the combination of (i) the license details
provided by you and accepted in the course of this licensing transaction, (ii) these terms and conditions and (iii) CCC's
Billing and Payment terms and conditions.
8. License Contingent Upon Payment: While you may exercise the rights licensed immediately upon issuance of the
license at the end of the licensing process for the transaction, provided that you have disclosed complete and accurate
details of your proposed use, no license is finally effective unless and until full payment is received from you (either by
publisher or by CCC) as provided in CCC's Billing and Payment terms and conditions.  If full payment is not received on
a timely basis, then any license preliminarily granted shall be deemed automatically revoked and shall be void as if
never granted.  Further, in the event that you breach any of these terms and conditions or any of CCC's Billing and
Payment terms and conditions, the license is automatically revoked and shall be void as if never granted.  Use of
materials as described in a revoked license, as well as any use of the materials beyond the scope of an unrevoked
license, may constitute copyright infringement and publisher reserves the right to take any and all action to protect its
copyright in the materials.
9. Warranties: Publisher makes no representations or warranties with respect to the licensed material.
10. Indemnity: You hereby indemnify and agree to hold harmless publisher and CCC, and their respective officers,
directors, employees and agents, from and against any and all claims arising out of your use of the licensed material
other than as specifically authorized pursuant to this license.
11. No Transfer of License: This license is personal to you and may not be sublicensed, assigned, or transferred by you
to any other person without publisher's written permission.
12. No Amendment Except in Writing: This license may not be amended except in a writing signed by both parties (or, in
the case of publisher, by CCC on publisher's behalf).
13. Objection to Contrary Terms: Publisher hereby objects to any terms contained in any purchase order,
acknowledgment, check endorsement or other writing prepared by you, which terms are inconsistent with these terms
and conditions or CCC's Billing and Payment terms and conditions.  These terms and conditions, together with CCC's
Billing and Payment terms and conditions (which are incorporated herein), comprise the entire agreement between you
and publisher (and CCC) concerning this licensing transaction.  In the event of any conflict between your obligations
established by these terms and conditions and those established by CCC's Billing and Payment terms and conditions,
these terms and conditions shall control.
14. Revocation: Elsevier or Copyright Clearance Center may deny the permissions described in this License at their sole
discretion, for any reason or no reason, with a full refund payable to you.  Notice of such denial will be made using the
contact information provided by you.  Failure to receive such notice will not alter or invalidate the denial.  In no event will
Elsevier or Copyright Clearance Center be responsible or liable for any costs, expenses or damage incurred by you as a
result of a denial of your permission request, other than a refund of the amount(s) paid by you to Elsevier and/or
Copyright Clearance Center for denied permissions.
LIMITED LICENSE
The following terms and conditions apply only to specific license types:
15. Translation: This permission is granted for non­exclusive world English rights only unless your license was granted
for translation rights. If you licensed translation rights you may only translate this content into the languages you
requested. A professional translator must perform all translations and reproduce the content word for word preserving
the integrity of the article. If this license is to re­use 1 or 2 figures then permission is granted for non­exclusive world
rights in all languages.
16. Posting licensed content on any Website: The following terms and conditions apply as follows: Licensing material
from an Elsevier journal: All content posted to the web site must maintain the copyright information line on the bottom of
each image; A hyper­text must be included to the Homepage of the journal from which you are licensing at
http://www.sciencedirect.com/science/journal/xxxxx or the Elsevier homepage for books at http://www.elsevier.com;
Central Storage: This license does not include permission for a scanned version of the material to be stored in a central
repository such as that provided by Heron/XanEdu.
Licensing material from an Elsevier book: A hyper­text link must be included to the Elsevier homepage at
http://www.elsevier.com . All content posted to the web site must maintain the copyright information line on the bottom of
each image.
Posting licensed content on Electronic reserve: In addition to the above the following clauses are applicable: The
web site must be password­protected and made available only to bona fide students registered on a relevant course.
This permission is granted for 1 year only. You may obtain a new license for future website posting.
17. For journal authors: the following clauses are applicable in addition to the above:
Preprints:
A preprint is an author's own write­up of research results and analysis, it has not been peer­reviewed, nor has it had any
other value added to it by a publisher (such as formatting, copyright, technical enhancement etc.).
Authors can share their preprints anywhere at any time. Preprints should not be added to or enhanced in any way in
order to appear more like, or to substitute for, the final versions of articles however authors can update their preprints on
arXiv or RePEc with their Accepted Author Manuscript (see below).
If accepted for publication, we encourage authors to link from the preprint to their formal publication via its DOI. Millions
of researchers have access to the formal publications on ScienceDirect, and so links will help users to find, access, cite
and use the best available version. Please note that Cell Press, The Lancet and some society­owned have different
preprint policies. Information on these policies is available on the journal homepage.
Accepted Author Manuscripts: An accepted author manuscript is the manuscript of an article that has been accepted
for publication and which typically includes author­incorporated changes suggested during submission, peer review and
editor­author communications.
Authors can share their accepted author manuscript:
-         immediately
via their non­commercial person homepage or blog
by updating a preprint in arXiv or RePEc with the accepted manuscript
via their research institute or institutional repository for internal institutional uses or as part of an invitation­only
research collaboration work­group
directly by providing copies to their students or to research collaborators for their personal use
for private scholarly sharing as part of an invitation­only work group on commercial sites with which Elsevier
has an agreement
-         after the embargo period
via non­commercial hosting platforms such as their institutional repository
via commercial sites with which Elsevier has an agreement
In all cases accepted manuscripts should:
-         link to the formal publication via its DOI
-         bear a CC­BY­NC­ND license ­ this is easy to do
-         if aggregated with other manuscripts, for example in a repository or other site, be shared in alignment with our
hosting policy not be added to or enhanced in any way to appear more like, or to substitute for, the published
journal article.
Published journal article (JPA): A published journal article (PJA) is the definitive final record of published research that
appears or will appear in the journal and embodies all value­adding publishing activities including peer review co­
ordination, copy­editing, formatting, (if relevant) pagination and online enrichment.
Policies for sharing publishing journal articles differ for subscription and gold open access articles:
Subscription Articles: If you are an author, please share a link to your article rather than the full­text. Millions of
researchers have access to the formal publications on ScienceDirect, and so links will help your users to find, access,
cite, and use the best available version.
Theses and dissertations which contain embedded PJAs as part of the formal submission can be posted publicly by the
awarding institution with DOI links back to the formal publications on ScienceDirect.
If you are affiliated with a library that subscribes to ScienceDirect you have additional private sharing rights for others'
research accessed under that agreement. This includes use for classroom teaching and internal training at the institution
(including use in course packs and courseware programs), and inclusion of the article for grant funding purposes.
Gold Open Access Articles: May be shared according to the author­selected end­user license and should contain a
CrossMark logo, the end user license, and a DOI link to the formal publication on ScienceDirect.
Please refer to Elsevier's posting policy for further information.
18. For book authors the following clauses are applicable in addition to the above:   Authors are permitted to place a
brief summary of their work online only. You are not allowed to download and post the published electronic version of
your chapter, nor may you scan the printed edition to create an electronic version. Posting to a repository: Authors are
permitted to post a summary of their chapter only in their institution's repository.
19. Thesis/Dissertation: If your license is for use in a thesis/dissertation your thesis may be submitted to your institution
in either print or electronic form. Should your thesis be published commercially, please reapply for permission. These
requirements include permission for the Library and Archives of Canada to supply single copies, on demand, of the
complete thesis and include permission for Proquest/UMI to supply single copies, on demand, of the complete thesis.
Should your thesis be published commercially, please reapply for permission. Theses and dissertations which contain
embedded PJAs as part of the formal submission can be posted publicly by the awarding institution with DOI links back
to the formal publications on ScienceDirect.
 
Elsevier Open Access Terms and Conditions
You can publish open access with Elsevier in hundreds of open access journals or in nearly 2000 established
subscription journals that support open access publishing. Permitted third party re­use of these open access articles is
defined by the author's choice of Creative Commons user license. See our open access license policy for more
information.
Terms & Conditions applicable to all Open Access articles published with Elsevier:
Any reuse of the article must not represent the author as endorsing the adaptation of the article nor should the article be
modified in such a way as to damage the author's honour or reputation. If any changes have been made, such changes
must be clearly indicated.
The author(s) must be appropriately credited and we ask that you include the end user license and a DOI link to the
formal publication on ScienceDirect.
If any part of the material to be used (for example, figures) has appeared in our publication with credit or
acknowledgement to another source it is the responsibility of the user to ensure their reuse complies with the terms and
conditions determined by the rights holder.
Additional Terms & Conditions applicable to each Creative Commons user license:
CC BY: The CC­BY license allows users to copy, to create extracts, abstracts and new works from the Article, to alter
and revise the Article and to make commercial use of the Article (including reuse and/or resale of the Article by
commercial entities), provided the user gives appropriate credit (with a link to the formal publication through the relevant
DOI), provides a link to the license, indicates if changes were made and the licensor is not represented as endorsing the
use made of the work. The full details of the license are available at http://creativecommons.org/licenses/by/4.0.
CC BY NC SA: The CC BY­NC­SA license allows users to copy, to create extracts, abstracts and new works from the
Article, to alter and revise the Article, provided this is not done for commercial purposes, and that the user gives
appropriate credit (with a link to the formal publication through the relevant DOI), provides a link to the license, indicates
if changes were made and the licensor is not represented as endorsing the use made of the work. Further, any new
works must be made available on the same conditions. The full details of the license are available at
http://creativecommons.org/licenses/by­nc­sa/4.0.
CC BY NC ND: The CC BY­NC­ND license allows users to copy and distribute the Article, provided this is not done for
commercial purposes and further does not permit distribution of the Article if it is changed or edited in any way, and
provided the user gives appropriate credit (with a link to the formal publication through the relevant DOI), provides a link
to the license, and that the licensor is not represented as endorsing the use made of the work. The full details of the
license are available at http://creativecommons.org/licenses/by­nc­nd/4.0. Any commercial reuse of Open Access
articles published with a CC BY NC SA or CC BY NC ND license requires permission from Elsevier and will be subject to
a fee.
Commercial reuse includes:
-         Associating advertising with the full text of the Article
-         Charging fees for document delivery or access
-         Article aggregation
-         Systematic distribution via e­mail lists or share buttons
Posting or linking by commercial companies for use by customers of those companies.
 
20. Other Conditions:
 
v1.7
Questions? customercare@copyright.com or +1­855­239­3415 (toll free in the US) or +1­978­646­2777.
Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable license for your reference. No
payment is required.
SPRINGER LICENSE
TERMS AND CONDITIONS
May 10, 2015
This is a License Agreement between Samuel de Sousa ("You") and Springer ("Springer")
provided by Copyright Clearance Center ("CCC"). The license consists of your order details,
the terms and conditions provided by Springer, and the payment terms and conditions.
All payments must be made in full to CCC. For payment instructions, please see
information listed at the bottom of this form.
License Number 3625271294081
License date May 10, 2015
Licensed content publisher Springer
Licensed content publication Springer eBook
Licensed content title Data Graph Formulation as the Minimum-Weight Maximum-Entropy
Problem
Licensed content author Samuel de Sousa
Licensed content date Jan 1, 2015
Type of Use Book/Textbook
Requestor type Publisher
Publisher Vienna University of Technology
Portion Full text
Format Print and Electronic
Will you be translating? No
Print run 6
Author of this Springer
article
Yes and you are the sole author of the new work
Order reference number None
Title of new book A Graph Centrality Approach to Computer Vision
Author of new book Samuel de Sousa
Expected publication date of
new book
Oct 2015
Estimated size of new book
(pages)
130
Total 0.00 USD
Terms and Conditions
Introduction
The publisher for this copyrighted material is Springer Science + Business Media. By
clicking "accept" in connection with completing this licensing transaction, you agree that the
following terms and conditions apply to this transaction (along with the Billing and Payment
terms and conditions established by Copyright Clearance Center, Inc. ("CCC"), at the time
that you opened your Rightslink account and that are available at any time at
http://myaccount.copyright.com).
Limited License
Springer Science + Business Media hereby grants to you a non­exclusive license to use this
material, for the use as indicated in your inquiry. Licenses are for one­time use only with a
maximum distribution equal to the number that you identified in the licensing process.
This License includes use in an electronic form, provided it's password protected, on
intranet, or CD­Rom/E­book. For any other electronic use, please contact Springer at
permissions.dordrecht@springer.com or permissions.heidelberg@springer.com
Although Springer holds copyright to the material and is entitled to negotiate on rights, this
license is only valid, provided permission is also obtained from the author (address is given
with the article/chapter) and provided it concerns original material which does not carry
references to other sources (if material in question appears with credit to another source,
authorization from that source is required as well).
Geographic Rights: Scope
Licenses may be exercised anywhere in the world.
Altering/Modifying Material: Not Permitted
However figures and illustrations may be altered minimally to serve your work. Any other
abbreviations, additions, deletions and/or any other alterations shall be made only with prior
written authorization of the author(s) and/or Springer Science + Business Media. (Please
contact Springer at permissions.dordrecht@springer.com or
permissions.heidelberg@springer.com)
Reservation of Rights
Springer Science + Business Media reserves all rights not specifically granted in the
combination of (i) the license details provided by you and accepted in the course of this
licensing transaction, (ii) these terms and conditions and (iii) CCC's Billing and Payment
terms and conditions.
License Contingent on Payment
While you may exercise the rights licensed immediately upon issuance of the license at the
end of the licensing process for the transaction, provided that you have disclosed complete
and accurate details of your proposed use, no license is finally effective unless and until full
payment is received from you (either by Springer Science + Business Media or by CCC) as
provided in CCC's Billing and Payment terms and conditions. If full payment is not received
by Due Date, then any license preliminarily granted shall be deemed automatically revoked
and shall be void as if never granted. Further, in the event that you breach any of these terms
and conditions or any of CCC's Billing and Payment terms and conditions, the license is
automatically revoked and shall be void as if never granted. Use of materials as described in
a revoked license, as well as any use of the materials beyond the scope of an unrevoked
license, may constitute copyright infringement and Springer Science + Business Media
reserves the right to take any and all action to protect its copyright in the materials.
Copyright Notice:
Please include the following copyright citation referencing the publication in which the
material was originally published. Where wording is within brackets, please include
verbatim.
"With kind permission from Springer Science+Business Media: <book/journal title,
chapter/article title, volume, year of publication, page, name(s) of author(s), figure
number(s), and any original (first) copyright notice displayed with material>."
Warranties
Springer Science + Business Media makes no representations or warranties with respect to
the licensed material.
Indemnity
You hereby indemnify and agree to hold harmless Springer Science + Business Media and
CCC, and their respective officers, directors, employees and agents, from and against any
and all claims arising out of your use of the licensed material other than as specifically
authorized pursuant to this license.
No Transfer of License
This license is personal to you and may not be sublicensed, assigned, or transferred by you
to any other person without Springer Science + Business Media's written permission.
No Amendment Except in Writing
This license may not be amended except in a writing signed by both parties (or, in the case
of Springer Science + Business Media, by CCC on Springer Science + Business Media's
behalf).
Objection to Contrary Terms
Springer Science + Business Media hereby objects to any terms contained in any purchase
order, acknowledgment, check endorsement or other writing prepared by you, which terms
are inconsistent with these terms and conditions or CCC's Billing and Payment terms and
conditions. These terms and conditions, together with CCC's Billing and Payment terms and
conditions (which are incorporated herein), comprise the entire agreement between you and
Springer Science + Business Media (and CCC) concerning this licensing transaction. In the
event of any conflict between your obligations established by these terms and conditions and
those established by CCC's Billing and Payment terms and conditions, these terms and
conditions shall control.
Jurisdiction
All disputes that may arise in connection with this present License, or the breach thereof,
shall be settled exclusively by the country's law in which the work was originally published.
Other terms and conditions:
v1.3
Questions? customercare@copyright.com or +1-855-239-3415 (toll free in the US) or
+1-978-646-2777.
Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable
license for your reference. No payment is required.
SPRINGER LICENSE
TERMS AND CONDITIONS
Apr 13, 2015
This is a License Agreement between Samuel de Sousa ("You") and Springer ("Springer")
provided by Copyright Clearance Center ("CCC"). The license consists of your order details,
the terms and conditions provided by Springer, and the payment terms and conditions.
All payments must be made in full to CCC. For payment instructions, please see
information listed at the bottom of this form.
License Number 3607130889324
License date Apr 13, 2015
Licensed content publisher Springer
Licensed content publication Springer eBook
Licensed content title Estimation of Distribution Algorithm for the Max-Cut Problem
Licensed content author Samuel de Sousa
Licensed content date Jan 1, 2013
Type of Use Book/Textbook
Requestor type Publisher
Publisher Vienna University of Technology
Portion Full text
Format Print and Electronic
Will you be translating? No
Print run 10
Author of this Springer
article
Yes and you are the sole author of the new work
Order reference number None
Title of new book A Graph Centrality Approach to Computer Vision
Author of new book Samuel de Sousa
Expected publication date of
new book
Oct 2015
Estimated size of new book
(pages)
130
Total 0.00 USD
Terms and Conditions
Introduction
The publisher for this copyrighted material is Springer Science + Business Media. By
clicking "accept" in connection with completing this licensing transaction, you agree that the
following terms and conditions apply to this transaction (along with the Billing and Payment
terms and conditions established by Copyright Clearance Center, Inc. ("CCC"), at the time
that you opened your Rightslink account and that are available at any time at
http://myaccount.copyright.com).
Limited License
Springer Science + Business Media hereby grants to you a non­exclusive license to use this
material, for the use as indicated in your inquiry. Licenses are for one­time use only with a
maximum distribution equal to the number that you identified in the licensing process.
This License includes use in an electronic form, provided it's password protected, on
intranet, or CD­Rom/E­book. For any other electronic use, please contact Springer at
permissions.dordrecht@springer.com or permissions.heidelberg@springer.com
Although Springer holds copyright to the material and is entitled to negotiate on rights, this
license is only valid, provided permission is also obtained from the author (address is given
with the article/chapter) and provided it concerns original material which does not carry
references to other sources (if material in question appears with credit to another source,
authorization from that source is required as well).
Geographic Rights: Scope
Licenses may be exercised anywhere in the world.
Altering/Modifying Material: Not Permitted
However figures and illustrations may be altered minimally to serve your work. Any other
abbreviations, additions, deletions and/or any other alterations shall be made only with prior
written authorization of the author(s) and/or Springer Science + Business Media. (Please
contact Springer at permissions.dordrecht@springer.com or
permissions.heidelberg@springer.com)
Reservation of Rights
Springer Science + Business Media reserves all rights not specifically granted in the
combination of (i) the license details provided by you and accepted in the course of this
licensing transaction, (ii) these terms and conditions and (iii) CCC's Billing and Payment
terms and conditions.
License Contingent on Payment
While you may exercise the rights licensed immediately upon issuance of the license at the
end of the licensing process for the transaction, provided that you have disclosed complete
and accurate details of your proposed use, no license is finally effective unless and until full
payment is received from you (either by Springer Science + Business Media or by CCC) as
provided in CCC's Billing and Payment terms and conditions. If full payment is not received
by Due Date, then any license preliminarily granted shall be deemed automatically revoked
and shall be void as if never granted. Further, in the event that you breach any of these terms
and conditions or any of CCC's Billing and Payment terms and conditions, the license is
automatically revoked and shall be void as if never granted. Use of materials as described in
a revoked license, as well as any use of the materials beyond the scope of an unrevoked
license, may constitute copyright infringement and Springer Science + Business Media
reserves the right to take any and all action to protect its copyright in the materials.
Copyright Notice:
Please include the following copyright citation referencing the publication in which the
material was originally published. Where wording is within brackets, please include
verbatim.
"With kind permission from Springer Science+Business Media: <book/journal title,
chapter/article title, volume, year of publication, page, name(s) of author(s), figure
number(s), and any original (first) copyright notice displayed with material>."
Warranties
Springer Science + Business Media makes no representations or warranties with respect to
the licensed material.
Indemnity
You hereby indemnify and agree to hold harmless Springer Science + Business Media and
CCC, and their respective officers, directors, employees and agents, from and against any
and all claims arising out of your use of the licensed material other than as specifically
authorized pursuant to this license.
No Transfer of License
This license is personal to you and may not be sublicensed, assigned, or transferred by you
to any other person without Springer Science + Business Media's written permission.
No Amendment Except in Writing
This license may not be amended except in a writing signed by both parties (or, in the case
of Springer Science + Business Media, by CCC on Springer Science + Business Media's
behalf).
Objection to Contrary Terms
Springer Science + Business Media hereby objects to any terms contained in any purchase
order, acknowledgment, check endorsement or other writing prepared by you, which terms
are inconsistent with these terms and conditions or CCC's Billing and Payment terms and
conditions. These terms and conditions, together with CCC's Billing and Payment terms and
conditions (which are incorporated herein), comprise the entire agreement between you and
Springer Science + Business Media (and CCC) concerning this licensing transaction. In the
event of any conflict between your obligations established by these terms and conditions and
those established by CCC's Billing and Payment terms and conditions, these terms and
conditions shall control.
Jurisdiction
All disputes that may arise in connection with this present License, or the breach thereof,
shall be settled exclusively by the country's law in which the work was originally published.
Other terms and conditions:
v1.3
Questions? customercare@copyright.com or +1-855-239-3415 (toll free in the US) or
+1-978-646-2777.
Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable
license for your reference. No payment is required.
SPRINGER LICENSE
TERMS AND CONDITIONS
Apr 13, 2015
This is a License Agreement between Samuel de Sousa ("You") and Springer ("Springer")
provided by Copyright Clearance Center ("CCC"). The license consists of your order details,
the terms and conditions provided by Springer, and the payment terms and conditions.
All payments must be made in full to CCC. For payment instructions, please see
information listed at the bottom of this form.
License Number 3607130760709
License date Apr 13, 2015
Licensed content publisher Springer
Licensed content publication Springer eBook
Licensed content title On the Evaluation of Graph Centrality for Shape Matching
Licensed content author Samuel de Sousa
Licensed content date Jan 1, 2013
Type of Use Book/Textbook
Requestor type Publisher
Publisher Vienna University of Technology
Portion Full text
Format Print and Electronic
Will you be translating? No
Print run 10
Author of this Springer
article
Yes and you are the sole author of the new work
Order reference number None
Title of new book A Graph Centrality Approach to Computer Vision
Author of new book Samuel de Sousa
Expected publication date of
new book
Oct 2015
Estimated size of new book
(pages)
130
Total 0.00 USD
Terms and Conditions
Introduction
The publisher for this copyrighted material is Springer Science + Business Media. By
clicking "accept" in connection with completing this licensing transaction, you agree that the
following terms and conditions apply to this transaction (along with the Billing and Payment
terms and conditions established by Copyright Clearance Center, Inc. ("CCC"), at the time
that you opened your Rightslink account and that are available at any time at
http://myaccount.copyright.com).
Limited License
Springer Science + Business Media hereby grants to you a non­exclusive license to use this
material, for the use as indicated in your inquiry. Licenses are for one­time use only with a
maximum distribution equal to the number that you identified in the licensing process.
This License includes use in an electronic form, provided it's password protected, on
intranet, or CD­Rom/E­book. For any other electronic use, please contact Springer at
permissions.dordrecht@springer.com or permissions.heidelberg@springer.com
Although Springer holds copyright to the material and is entitled to negotiate on rights, this
license is only valid, provided permission is also obtained from the author (address is given
with the article/chapter) and provided it concerns original material which does not carry
references to other sources (if material in question appears with credit to another source,
authorization from that source is required as well).
Geographic Rights: Scope
Licenses may be exercised anywhere in the world.
Altering/Modifying Material: Not Permitted
However figures and illustrations may be altered minimally to serve your work. Any other
abbreviations, additions, deletions and/or any other alterations shall be made only with prior
written authorization of the author(s) and/or Springer Science + Business Media. (Please
contact Springer at permissions.dordrecht@springer.com or
permissions.heidelberg@springer.com)
Reservation of Rights
Springer Science + Business Media reserves all rights not specifically granted in the
combination of (i) the license details provided by you and accepted in the course of this
licensing transaction, (ii) these terms and conditions and (iii) CCC's Billing and Payment
terms and conditions.
License Contingent on Payment
While you may exercise the rights licensed immediately upon issuance of the license at the
end of the licensing process for the transaction, provided that you have disclosed complete
and accurate details of your proposed use, no license is finally effective unless and until full
payment is received from you (either by Springer Science + Business Media or by CCC) as
provided in CCC's Billing and Payment terms and conditions. If full payment is not received
by Due Date, then any license preliminarily granted shall be deemed automatically revoked
and shall be void as if never granted. Further, in the event that you breach any of these terms
and conditions or any of CCC's Billing and Payment terms and conditions, the license is
automatically revoked and shall be void as if never granted. Use of materials as described in
a revoked license, as well as any use of the materials beyond the scope of an unrevoked
license, may constitute copyright infringement and Springer Science + Business Media
reserves the right to take any and all action to protect its copyright in the materials.
Copyright Notice:
Please include the following copyright citation referencing the publication in which the
material was originally published. Where wording is within brackets, please include
verbatim.
"With kind permission from Springer Science+Business Media: <book/journal title,
chapter/article title, volume, year of publication, page, name(s) of author(s), figure
number(s), and any original (first) copyright notice displayed with material>."
Warranties
Springer Science + Business Media makes no representations or warranties with respect to
the licensed material.
Indemnity
You hereby indemnify and agree to hold harmless Springer Science + Business Media and
CCC, and their respective officers, directors, employees and agents, from and against any
and all claims arising out of your use of the licensed material other than as specifically
authorized pursuant to this license.
No Transfer of License
This license is personal to you and may not be sublicensed, assigned, or transferred by you
to any other person without Springer Science + Business Media's written permission.
No Amendment Except in Writing
This license may not be amended except in a writing signed by both parties (or, in the case
of Springer Science + Business Media, by CCC on Springer Science + Business Media's
behalf).
Objection to Contrary Terms
Springer Science + Business Media hereby objects to any terms contained in any purchase
order, acknowledgment, check endorsement or other writing prepared by you, which terms
are inconsistent with these terms and conditions or CCC's Billing and Payment terms and
conditions. These terms and conditions, together with CCC's Billing and Payment terms and
conditions (which are incorporated herein), comprise the entire agreement between you and
Springer Science + Business Media (and CCC) concerning this licensing transaction. In the
event of any conflict between your obligations established by these terms and conditions and
those established by CCC's Billing and Payment terms and conditions, these terms and
conditions shall control.
Jurisdiction
All disputes that may arise in connection with this present License, or the breach thereof,
shall be settled exclusively by the country's law in which the work was originally published.
Other terms and conditions:
v1.3
Questions? customercare@copyright.com or +1-855-239-3415 (toll free in the US) or
+1-978-646-2777.
Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable
license for your reference. No payment is required.

Bibliography
[1] A. Abbasi and J. Altmann. On the correlation between research performance and
social network analysis measures applied to research collaboration networks. In
44th Hawaii International Conference onSystem Sciences (HICSS), pages 1–10,
2011.
[2] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua,
and Sabine Susstrunk. Slic superpixels compared to state-of-the-art superpixel
methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34
(11):2274–2282, November 2012. ISSN 0162-8828.
[3] V. Aggarwal, Y.P. Aneja, and K.P.K. Nair. Minimal spanning tree subject to a
side constraint. Computers and Operations Research, 9(4):287 – 296, 1982. ISSN
0305-0548.
[4] Kim Allan Andersen, Kurt Jőrnsten, and Mikael Lind. On bicriterion minimal
spanning trees: An approximation. Computers and Operations Research, 23(12):
1171 – 1182, 1996. ISSN 0305-0548.
[5] David L. Applegate, Robert E. Bixby, Vasek Chvatal, and William J. Cook. The
Traveling Salesman Problem: A Computational Study (Princeton Series in Applied
Mathematics). Princeton University Press, Princeton, NJ, USA, 2007. ISBN ISBN
0691129932.
[6] M. A. Audette, F. P. Ferrie, and T. M. Peters. An algorithmic overview of surface
registration techniques for medical imaging. Medical Image Analysis, 4(3):201–217,
2000.
[7] Marco Antonio Balieiro, Samuel de Sousa, and Cleidson R. B. de Souza. Facilitating
social network studies of floss using the ossnetwork environment. In Proceedings
of International Conference on Open Source Systems, OSS., IFIP, pages 343–350.
Springer, 2008. ISBN 978-0-387-09683-4.
[8] Alex Bavelas. A mathematical model for group structures. Human Organization, 7,
1948.
[9] Alex Bavelas. A mathematical model for small group structures. Human Organiza-
tion, 7:16–30, 1948.
119
[10] Alex Bavelas. Communication patterns in task oriented groups. Journal of the
Acoustical Society of America, 22:271–282, 1950.
[11] MA Beauchamp. An improved index of centrality. Behavioral Science, 1965.
[12] P.J. Besl and Neil D. McKay. A method for registration of 3-d shapes. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, 1992.
ISSN 0162-8828.
[13] H. Blum. Biological shape and visual science (part I). Journal of Theoretical
Biology, 38:205–287, 1973.
[14] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-
flow algorithms for energy minimization in vision. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 26(9):1124–1137, Sept 2004. ISSN 0162-8828.
[15] Y. Y. Boykov and M. P. Jolly. Interactive graph cuts for optimal boundary &
region segmentation of objects in N-D images. volume 1, pages 105–112, 2001.
[16] Yuri Boykov and Gareth F. Lea. Graph Cuts and Efficient N-D Image Segmentation.
International Journal of Computer Vision, 70(2):109–131, November 2006. ISSN
0920-5691.
[17] Enrico Bozzo and Massimo Franceschet. Resistance distance, closeness, and be-
tweenness. Social Networks, 35(3):460 – 469, 2013. ISSN 0378-8733.
[18] T.S. Caetano, T. Caelli, D. Schuurmans, and D. A C Barone. Graphical models
and point pattern matching. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 28(10):1646–1663, 2006. ISSN 0162-8828.
[19] Marco Carcassoni and Edwin R. Hancock. Spectral correspondence for point
pattern matching. Pattern Recognition, 36(1):193 – 204, 2003. ISSN 0031-3203.
[20] J Cheeger. A Lower Bound for the Smallest Eigenvalue of the Laplacian. Problems
in Analysis, 1970.
[21] D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty Years Of Graph Matching
In Pattern Recognition. International Journal of Pattern Recognition and Artificial
Intelligence, 2004.
[22] CarlosD. Correa and Kwan-Liu Ma. Visualizing social networks. In Charu C.
Aggarwal, editor, Social Network Data Analytics, pages 307–326. Springer US, 2011.
ISBN 978-1-4419-8461-6.
[23] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning,
20(3):273–297, 1995. ISSN 0885-6125.
120
[24] Xavier Cortés, Francesc Serratosa, and Carlos F. Moreno-García. On the influence
of node centralities on graph edit distance for graph classification. In Cheng-
Lin Liu, Bin Luo, Walter G. Kropatsch, and Jian Cheng, editors, Graph-Based
Representations in Pattern Recognition, volume 9069 of Lecture Notes in Computer
Science, pages 231–241. Springer International Publishing, 2015. ISBN 978-3-319-
18223-0.
[25] A.D.J. Cross and E.R. Hancock. Graph matching with a dual-step em algorithm.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1236–1253,
Nov 1998. ISSN 0162-8828.
[26] W.J. Cukierski and D.J. Foran. Using betweenness centrality to identify manifold
shortcuts. In IEEE International Conference on Data Mining Workshops, pages
949 –958, dec. 2008.
[27] Samuel de Sousa and Walter G. Kropatsch. Graph-based point drift: Graph
centrality on the registration of point-sets. Pattern Recognition, 48(2):368 – 379,
2015. ISSN 0031-3203.
[28] Samuel de Sousa and Walter G. Kropatsch. Data Graph Formulation as the
Minimum-Weight Maximum-Entropy Problem. In Luo B. Kropatsch W.G. Cheng J
Liu, C.-L., editor, Graph-Based Representations in Pattern Recognition, volume
9069 of Lecture Notes in Computer Science, pages 244–253. Springer International
Publishing, 2015.
[29] Samuel de Sousa and Walter G. Kropatsch. The Minimum Spanning Tree of
Maximum Entropy. In Proceedings of The 39th Annual Workshop of the Austrian
Association for Pattern Recognition (ÖAGM). 2015.
[30] Samuel de Sousa, M.A. Balieiro, J.M. dos R. Costa, and C.R.B. de Souza. Multiple
social networks analysis of floss projects using sargas. In HICSS ’09. 42nd Hawaii
International Conference on System Sciences., pages 1–10, 2009.
[31] Samuel de Sousa, Nicole M. Artner, and Walter G. Kropatsch. On the evaluation
of graph centrality for shape matching. In GbRPR, volume 7877 of Lecture Notes
in Computer Science, pages 204–213. Springer, 2013. ISBN 978-3-642-38220-8.
[32] Samuel de Sousa, Yll Haxhimusa, and Walter G. Kropatsch. Estimation of Dis-
tribution Algorithm for the Max-Cut Problem. In WalterG. Kropatsch, NicoleM.
Artner, Yll Haxhimusa, and Xiaoyi Jiang, editors, Graph-Based Representations
in Pattern Recognition, volume 7877 of Lecture Notes in Computer Science, pages
244–253. Springer Berlin Heidelberg, 2013.
[33] B. N. Delaunay. Sur la sphère vide. Bulletin of Academy of Sciences of the USSR,
pages 793–800, 1934.
121
[34] PeterJ. Dickinson, Horst Bunke, Arek Dadej, and Miro Kraetzl. Matching graphs
with unique node labels. Pattern Analysis and Applications, 7(3):243–254, 2004.
ISSN 1433-7541.
[35] W.E. Donath and A.J. Hoffman. Lower bounds for the partitioning of graphs. IBM
Research and Developments, 1973.
[36] H. Edelsbrunner, D. Kirkpatrick, and R. Seidel. On the shape of a set of points
in the plane. IEEE Transactions on Information Theory, 29(4):551–559, Jul 1983.
ISSN 0018-9448.
[37] T. Erdős, P.; Gallai. Gráfok előírt fokszámú pontokkal. Matematikai Lapok, 11:
264–274, 1960.
[38] Leonard Euler. Solutio problematis ad geometriam situs pertinentis. Commentarii
academiae scientiarum Petropolitanae, 8:128–140, 1741.
[39] Esther Ezra, Micha Sharir, and Alon Efrat. On the performance of the ICP
algorithm. Computational Geometry, 41(1-2):77 – 93, 2008. ISSN 0925-7721. Special
Issue on the 22nd European Workshop on Computational Geometry (EuroCG)
22nd European Workshop on Computational Geometry.
[40] P.F. Felzenszwalb and J.D. Schwartz. Hierarchical matching of deformable shapes.
In IEEE Conference on Computer Vision and Pattern Recognition, pages 1 –8, june
2007.
[41] Andrew W Fitzgibbon. Robust registration of 2d and 3d point sets. Image and
Vision Computing, 21(13-14):1145 – 1153, 2003. ISSN 0262-8856.
[42] Linton C. Freeman. A Set of Measures of Centrality Based on Betweenness.
Sociometry, 40(1):35–41, March 1977.
[43] Linton C. Freeman. Centrality in social networks conceptual clarification. Social
Networks, page 215, 1978.
[44] K. Ruben Gabriel and Robert R. Sokal. A new statistical approach to geographic
variation analysis. Systematic Biology, 18(3):259–278, 1969.
[45] Silvia Garcia-Diez, François Fouss, Masashi Shimbo, and Marco Saerens. A sum-
over-paths extension of edit distances accounting for all sequence alignments.
Pattern Recognition, 44(6):1172 – 1182, 2011. ISSN 0031-3203.
[46] Steven Gold, Anand Rangarajan, and Eric Mjolsness. Learning with preknowledge:
Clustering with point and graph matching distance measures. Neural Computation,
8:787–804, 1996.
122
[47] Janaína Gomide, Adriano Veloso, Wagner Meira, Jr., Virgílio Almeida, Fabrício
Benevenuto, Fernanda Ferraz, and Mauro Teixeira. Dengue surveillance based on a
computational model of spatio-temporal locality of twitter. In Proceedings of the
3rd International Web Science Conference, WebSci ’11, pages 3:1–3:8, New York,
NY, USA, 2011. ACM. ISBN 978-1-4503-0855-7.
[48] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing (3rd Edition).
Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2006. ISBN 013168728X.
[49] Ian J. Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet.
Multi-digit number recognition from street view imagery using deep convolutional
neural networks. CoRR, abs/1312.6082, 2013.
[50] L. Grady and C.V. Alvino. The piecewise smooth mumford shah functional on an
arbitrary graph. IEEE Transactions on Image Processing, 18(11):2547–2561, Nov
2009. ISSN 1057-7149.
[51] D. M. Greig, B. T. Porteous, and A. H. Seheult. Exact maximum a posteriori
estimation for binary images. Journal of the Royal Statistical Society. Series B
(Methodological), 51(2):271–279, 1989. ISSN 00359246.
[52] Séverine Habert, Parmeshwar Khurd, and Christophe Chefd’hotel. Registration of
multiple temporally related point sets using a novel variant of the coherent point
drift algorithm: application to coronary tree matching. Proceedings of SPIE, 8669:
86690M–86690M–11, 2013.
[53] S. Hakimi. On realizability of a set of integers as degrees of the vertices of a linear
graph. i. Journal of the Society for Industrial and Applied Mathematics, 10(3):
496–506, 1962.
[54] Awni Y. Hannun, Carl Case, Jared Casper, Bryan C. Catanzaro, Greg Diamos,
Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates,
and Andrew Y. Ng. Deep speech: Scaling up end-to-end speech recognition. CoRR,
abs/1412.5567, 2014.
[55] E Harary. Status and contrastatus. Sociometry, 22:23–43, 1959.
[56] Václav Havel. Poznámka o existenci konečných graf u. Časopis pro pěstování
matematiky, 080(4):477–480, 1955.
[57] Ren-jie Hu, Guang-yu Zhang, and Li-ping Liao. The closeness centrality analysis of
fuzzy social network based on inversely attenuation factor. In Bing-Yuan Cao and
Hadi Nasseri, editors, Fuzzy Information and Engineering and Operations Research
and Management, volume 211 of Advances in Intelligent Systems and Computing,
pages 457–465. Springer Berlin Heidelberg, 2014.
123
[58] Ren-Jie Hu, Qing Li, Guang-Yu Zhang, and Wen-Cong Ma. Centrality measures in
directed fuzzy social networks. Fuzzy Information and Engineering, 7(1):115 – 128,
2015. ISSN 1616-8658.
[59] Varun Jain and Hao Zhang. Robust 3d shape correspondence in the spectral
domain. In IEEE International Conference on Shape Modeling and Applications,
pages 19–19, 2006.
[60] H. Jiang, S. X. Yu, and D. R. Martin. Linear scale and rotation invariant matching.
IEEE Transactions of Pattern Analysis and Machine Intelligence, 2011.
[61] George Judge, Douglas Miller, and Golan. Maximum Entropy Econometrics: Robust
Estimation With Limited Data. John Wiley & Sons, March 1996. ISBN 0471953113.
[62] D.J. Klein and M. Randić. Resistance distance. Journal of Mathematical Chemistry,
12(1):81–95, 1993. ISSN 0259-9791.
[63] W. Kocay and D.L. Kreher. Graphs, Algorithms, and Optimization. Discrete
Mathematics and Its Applications. Taylor & Francis, 2004. ISBN 9780203489055.
[64] Walter G. Kropatsch, Adrian Ion, Yll Haxhimusa, and Thomas Flanitzer. The
eccentricity transform (of a digital shape). In Discrete Geometry for Computer
Imagery, volume 4245 of Lecture Notes in Computer Science, pages 437–448.
Springer Berlin Heidelberg, 2006. ISBN 978-3-540-47651-1.
[65] Joseph B. Kruskal. On the Shortest Spanning Subtree of a Graph and the Traveling
Salesman Problem. Proceedings of the American Mathematical Society, 7(1):48–50,
February 1956.
[66] Christoph H. Lampert. Kernel methods in computer vision. Foundations and Trends
in Computer Graphics and Vision, 4(3):193–285, March 2009. ISSN 1572-2740.
[67] Pat Langley. The changing science of machine learning. Machine Learning, 82(3):
275–279, 2011. ISSN 0885-6125.
[68] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied
to document recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998.
ISSN 0018-9219.
[69] Marius Leordeanu and Martial Hebert. A spectral technique for correspondence
problems using pairwise constraints. In Proceedings of the Tenth IEEE Interna-
tional Conference on Computer Vision - Volume 2, ICCV ’05, pages 1482–1489,
Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2334-X-02.
[70] Vincent Lepetit and Pascal Fua. Keypoint recognition using randomized trees.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1465–1479,
September 2006. ISSN 0162-8828.
124
[71] Qun Li, Zhen Qin, Lunshao Chai, Honggang Zhang, Jun Guo, and Bir Bhanu. Rep-
resentative reference-set and betweenness centrality for scene image categorization.
In 20th IEEE International Conference on Image Processing (ICIP), 2013.
[72] Wei Lian and Lei Zhang. Rotation invariant non-rigid shape matching in cluttered
scenes. In Kostas Daniilidis, Petros Maragos, and Nikos Paragios, editors, European
Conference in Computer Vision (ECCV), volume 6315 of Lecture Notes in Computer
Science, pages 506–518. Springer Berlin Heidelberg, 2010. ISBN 978-3-642-15554-3.
[73] Wei Lian, D. Zhang, and D. Zhang. Rotation-invariant nonrigid point set matching
in cluttered scenes. IEEE Transactions on Image Processing, 21(5):2786–2797,
2012. ISSN 1057-7149.
[74] L. Liao, R. Hu, and G. Zhang. The centrality analysis of fuzzy social networks.
Fuzzy Systems and Mathematics, 27(2):169–173, 2012.
[75] B. Luo and E. R. Hancock. Iterative procrustes alignment with the em algorithm.
Image and Vision Computing, 2002.
[76] B. Luo and E.R. Hancock. Iterative procrustes alignment with the EM algorithm.
Image and Vision Computing, 20(5-6):377 – 396, 2002. ISSN 0262-8856.
[77] Bin Luo and E. R. Hancock. A unified framework for alignment and correspondence.
Computer Vision and Image Understanding, 92(1):26–55, October 2003. ISSN 1077-
3142.
[78] J. MacQueen. Some methods for classification and analysis of multivariate observa-
tions. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and
Probability, University of California 1965/66, 1, 281-297 (1967)., 1967.
[79] A. Mantrach, Luh Yen, J. Callut, K. Francoisse, M. Shimbo, and M. Saerens. The
sum-over-paths covariance kernel: A novel covariance measure between nodes of a
directed graph. IEEE Transactions on Pattern Analysis and Machine Intelligence,
32(6):1112–1126, 2010. ISSN 0162-8828.
[80] D. Mateus, R. Horaud, D. Knossow, F. Cuzzolin, and E. Boyer. Articulated shape
matching using laplacian eigenfunctions and unsupervised point registration. In
IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008.
[81] M.D. McFarlane. Digital pictures fifty years ago. Proceedings of the IEEE, 60(7):
768–770, July 1972. ISSN 0018-9219.
[82] Bojan Mohar. Laplace eigenvalues of graphs - a survey. Discrete Mathematics, 109
(1-3):171 – 183, 1992. ISSN 0012-365X.
[83] S. Mukherjee, S.K. Biswas, and D.P. Mukherjee. Recognizing human action at a
distance in video by key poses. IEEE Transactions on Circuits and Systems for
Video Technology, 21(9):1228 –1241, sept. 2011. ISSN 1051-8215.
125
[84] Andreas C. Műller, Sebastian Nowozin, and Christoph H. Lampert. Information
theoretic clustering using minimum spanning trees. In Axel Pinz, Thomas Pock,
Horst Bischof, and Franz Leberl, editors, Pattern Recognition, volume 7476 of
Lecture Notes in Computer Science, pages 205–215. Springer Berlin Heidelberg,
2012. ISBN 978-3-642-32716-2.
[85] David Mumford and Jayant Shah. Optimal approximations by piecewise smooth
functions and associated variational problems. Communications on Pure and
Applied Mathematics, 42(5):577–685, 1989. ISSN 1097-0312.
[86] Andriy Myronenko and Xubo Song. Point Set Registration: Coherent Point Drift.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12):2262–2275,
December 2010. ISSN 0162-8828.
[87] A. B M Nasiruzzaman, H.R. Pota, and M.A. Mahmud. Application of centrality
measures of complex network framework in power grid. In IECON 2011 - 37th
Annual Conference on IEEE Industrial Electronics Society, pages 4660–4665, 2011.
[88] M. Newman. A measure of betweenness centrality based on random walks. Social
Networks, 27(1):39–54, 2005.
[89] S. Ohrhallinger and S. Mudur. An efficient algorithm for determining an aesthetic
shape connecting unorganized 2d points. Computer Graphics Forum, 32(8):72–88,
2013. ISSN 1467-8659.
[90] Kazuya Okamoto, Wei Chen, and Xiang-Yang Li. Ranking of closeness centrality
for large-scale social networks. In Proceedings of the 2nd annual international
workshop on Frontiers in Algorithmics, pages 186–195, Berlin, Heidelberg, 2008.
Springer. ISBN 978-3-540-69310-9.
[91] R. Pal, A. Mukherjee, P. Mitra, and J. Mukherjee. Modelling visual saliency using
degree centrality. Computer Vision, IET, 4(3):218–229, 2010. ISSN 1751-9632.
[92] T. Pock, D. Cremers, H. Bischof, and A. Chambolle. An algorithm for minimizing
the mumford-shah functional. In IEEE 12th International Conference on Computer
Vision, pages 1133–1140, Sept 2009.
[93] Helmut Pottmann, Qi-Xing Huang, Yong-Liang Yang, and Shi-Min Hu. Geometry
and convergence analysis of algorithms for registration of 3D shapes. International
Journal of Computer Vision, 67(3):277–296, 2006.
[94] R. C. Prim. Shortest connection networks and some generalizations. Bell System
Technology Journal, 36:1389–1401, 1957.
[95] Huaijun Qiu and Edwin R. Hancock. Image segmentation using commute times.
In In British Machine Vision Conference, pages 929–938, 2005.
126
[96] Huaijun Qiu and Edwin R. Hancock. Clustering and embedding using commute
times. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):
1873–1890, 2007. ISSN 0162-8828.
[97] R. Ravi and M.X. Goemans. The constrained minimum spanning tree problem.
In Rolf Karlsson and Andrzej Lingas, editors, Algorithm Theory âĂŤ SWAT’96,
volume 1097 of Lecture Notes in Computer Science, pages 66–75. Springer Berlin
Heidelberg, 1996. ISBN 978-3-540-61422-7.
[98] G. Reeb. Sur les points singuliers d’une forme de pfaff complétement intégrable ou
d’une fonction numérique. Comptes Rendus de l’Académie des Sciences Paris, 222:
847–849, 1946.
[99] Marcos Rodrigues, Robert Fisher, and Yonghuai Liu. Special issue on registration
and fusion of range images. Computer Vision and Image Understanding, 87(1-3):
1–7, July 2002. ISSN 1077-3142.
[100] Frank Rosenblatt. The perceptron, a perceiving and recognizing automaton. Report
85-460-1. Cornell Aeronautical Laboratory, 1957.
[101] J. N. Rosenquist, J. Murabito, J. H. Fowler, and N. A. Christakis. The spread
of alcohol consumption behavior in a large social network. Annals of Internal
Medicine, 152(7):426–W141, 2010.
[102] S. Rusinkiewicz and M. Levoy. Efficient variants of the icp algorithm. In Third
International Conference on 3-D Digital Imaging and Modeling, pages 145–152,
2001.
[103] G Sabidussi. The centrality index of a graph. Psychometrika, 1966.
[104] Marco Saerens, François Fouss, Luh Yen, and Pierre Dupont. The principal
components analysis of a graph, and its relationships to spectral clustering. In
Proceedings of the 15th European Conference on Machine Learning, pages 371–383.
Springer-Verlag, 2004.
[105] Yusuf Sahillioglu and Yücel Yemez. Minimum-distortion isometric shape correspon-
dence using em algorithm. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 34(11):2203–2215, 2012.
[106] Joaquim Salvi, Carles Matabosch, David Fofi, and Josep Forest. A review of
recent range image registration methods with accuracy evaluation. Image Vision
Computing, 25(5):578–596, May 2007. ISSN 0262-8856.
[107] Daniel Scharstein and Richard Szeliski. A taxonomy and evaluation of dense
two-frame stereo correspondence algorithms. International Journal of Computer
Vision, 47(1-3):7–42, April 2002. ISSN 0920-5691.
127
[108] G. Scott and Longuet H. Higgins. An algorithm for associating the features of two
patterns. In Proc. Royal Society London, volume B244, pages 21–26, 1991.
[109] T.B. Sebastian, P.N. Klein, and B.B. Kimia. Recognition of shapes by editing their
shock graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence,
26(5):550 –571, may 2004. ISSN 0162-8828.
[110] Mathieu Senelle, Silvia García-Díez, Amin Mantrach, Masashi Shimbo, Marco
Saerens, and François Fouss. The sum-over-forests density index: identifying dense
regions in a graph. CoRR, abs/1301.0725, 2013.
[111] Larry S Shapiro and J Michael Brady. Feature-based correspondence: an eigenvector
approach. Image and Vision Computing, 10(5):283 – 288, 1992. ISSN 0262-8856.
[112] Daniel Sharvit, Jacky Chan, Hüseyin Tek, and Benjamin B. Kimia. Symmetry-
based indexing of image databases. Journal of Visual Communication and Image
Representation, 9(4):366–380, December 1998.
[113] Jianbo Shi and Jitendra Malik. Normalized Cuts and Image Segmentation. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
[114] Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio,
Richard Moore, Alex Kipman, and Andrew Blake. Real-time human pose recognition
in parts from a single depth image. In IEEE Conference on Computer Vision and
Pattern Recognition. IEEE, June 2011.
[115] K. Siddiqi, A. Shokoufandeh, S.J. Dickenson, and S.W. Zucker. Shock graphs and
shape matching. In Sixth International Conference on Computer Vision, 1998,
pages 222–229, Jan 1998.
[116] Y. Taigman, Ming Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to
human-level performance in face verification. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 1701–1708, June 2014.
[117] G.K.L. Tam, Zhi-Quan Cheng, Yu-Kun Lai, F.C. Langbein, Yonghuai Liu, D. Mar-
shall, R.R. Martin, Xian-Fang Sun, and P.L. Rosin. Registration of 3d point clouds
and meshes: A survey from rigid to nonrigid. IEEE Transactions on Visualization
and Computer Graphics, 19(7):1199–1217, 2013.
[118] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric
framework for nonlinear dimensionality reduction. Science, 290(5500):2319, 2000.
[119] L. Torresani, V. Kolmogorov, and C. Rother. A dual decomposition approach
to feature correspondence. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 35(2):259–271, 2013. ISSN 0162-8828.
[120] Andrea Torsello and Edwin R. Hancock. A skeletal measure of 2d shape similarity.
Computer Vision and Image Understanding, 95(1):1 – 29, 2004. ISSN 1077-3142.
128
[121] A. M. Turing. Computing machinery and intelligence. Mind, LIX(236):433–460,
1950.
[122] S. Umeyama. An eigendecomposition approach to weighted graph matching prob-
lems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5):
695–703, 1988. ISSN 0162-8828.
[123] J. Utans. Mixture models and the em algorithms for object recognition within
compositional hierarchies. Technical report, ICSI Berkeley, 1993.
[124] Oliver van Kaick, Hao Zhang, Ghassan Hamarneh, and Daniel Cohen-Or. A survey
on shape correspondence. Computer Graphics Forum, 30(6):1681–1707, 2011.
[125] Zihou Wang, Yanni Han, Tao Lin, Hui Tang, and Song Ci. Virtual network
embedding by exploiting topological information. In IEEE Global Communications
Conference (GLOBECOM), pages 2603–2608, 2012.
[126] Stanley Wasserman and Katherine Faust. Social Network Analysis. Methods and
Applications. Cambridge University Press, New York, USA, 1994.
[127] Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the
Behavioral Sciences. PhD thesis, Harvard University, 1974.
[128] Jiaolong Yang, Hongdong Li, and Yunde Jia. Go-icp: Solving 3d registration
efficiently and globally optimally. In IEEE International Conference on Computer
Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 1457–1464. IEEE,
2013.
[129] Arnold Zellner and Richard A. Highfield. Calculation of maximum entropy distri-
butions and approximation of marginalposterior distributions. Journal of Econo-
metrics, 37(2):195 – 209, 1988. ISSN 0304-4076.
[130] Feng Zhou and Fernando De la Torre. Factorized graph matching. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[131] Feng Zhou and Fernando De la Torre. Deformable graph matching. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[132] Long Zhu, Yuanhao Chen, and A. Yuille. Learning a hierarchical deformable
template for rapid deformable object parsing. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 32(6):1029 –1043, june 2010.
129

Index
adjacency matrix, 8, 9, 17, 19, 27, 29–31,
59
affine, 34, 36, 38
algorithm, 1, 2, 4–7, 9, 11, 34–40, 44, 65,
67, 77
alignment, 39, 65, 66
anisotropic, 4, 36
artificial, 1, 2
Bayesian Networks, 8
behavior, ix, 1, 49, 64, 94
betweenness, 11, 26, 28, 29, 31, 46
centrality, 4, 5, 7–11, 13–15, 21, 23, 25–
31, 34, 40–42, 46, 47, 49–54, 57–
68, 71–73, 100–102
Chevycheb, 24
classification, 6, 7, 9, 11
closeness, 11, 26, 29–31, 47, 53, 54, 57,
59, 65–68, 71–73, 101
cluster, 5, 13
clustering, 31
commute time, 31
convergence, 37
correspondence, 4, 8, 9, 11, 30, 33–35,
37–40
data-graph, 7–9, 11, 57, 67, 68, 70, 75,
76, 91, 92, 95, 102
data-set, 11
degree, 11, 26, 27, 30, 31, 76
Delaunay, 7, 8, 11, 39, 67–70, 75, 77, 80,
91–93, 95
distance, 10, 21–26, 28, 29, 31, 44, 92
distribution, 10, 11, 31, 37, 46, 47, 49, 53,
54, 57–59, 73, 76, 79, 80, 92, 100
eigenvector, 9, 11, 26, 29–31
embedding, 39
entropy, 11, 76, 92, 93
Euclidean, 21, 23, 24, 34, 38, 39, 93
feature, 4, 6, 7, 40, 42, 44, 46, 47, 50–54
Gaussian, 39, 58–60, 63, 65, 70, 100
geodesic, 25, 26, 28, 29, 39, 47
graph, 4, 5, 7–9, 11, 13–23, 25–31, 42, 44,
46, 47, 52–54, 57, 59, 62, 63, 67,
70, 72, 75–80, 82, 83, 86, 88, 89,
91–93, 95, 99–101
graph-cuts, 6, 8, 29
histogram, 29, 46
image, 2–9, 30, 31
isotropic, 4, 35
Lagrangian, 92
Laplacian, 29, 31, 39
learning, 1–3, 6, 7, 9, 11
length, 18, 25
likelihood, 38, 39
machine, 1, 6, 9
Manhattan, 24
matching, 4, 11, 26, 29, 30, 39, 42
maximizing, 11, 39
maximum flow, 6, 16, 20
medial axis, 42
131
medical, 4
metric, 10, 21–24, 37, 76, 77
minimization, 20, 37
minimum cut, 6
network, 13, 26–29, 31, 99
neural network, 1
non-rigid, 4, 38, 67
optimization, 6, 11, 23, 38–40, 76, 92, 93
path, 18, 25, 26
point-set, 4, 7, 9, 30, 57, 62–64, 66, 68,
70, 72, 91–95, 102
registration, 4, 5, 8, 9, 11, 30, 33, 34,
36–40, 57, 64–68, 70, 71, 73, 75,
76, 89, 91, 100–102
rigid, 4, 33–38, 65, 67
rotation, 4, 30, 33–35
scale, 5, 35, 36
segmentation, 5, 6, 8, 16, 27, 29, 31
shape, 9, 29, 42, 44
shock graph, 42, 44
similarity, 4, 16, 28, 34–36, 39, 44, 64–66,
88, 89
skeleton, 44
topology, 9–11, 29, 44, 46, 53, 65, 91
transformation, 4, 8, 11, 33–39, 63–67
translation, 4, 30, 33–35
uniform, 37
vision, 2–6, 8, 9, 11, 13, 21, 31
Voronoi, 7, 92
132
Acronyms
AI Artificial Intelligence. 1
CG Computer Graphics. 2
CPD Coherent Point Drift. 5, 9, 11, 34, 37, 38, 40, 60–67, 71, 72, 95, 100, 102
CV Computer Vision. 2, 3, 5, 20, 27–29, 31, 42, 75
DGM Deformable Graph Matching. 39
DIP Digital Image Processing. 2
DP Dynamic Programming. 11, 23, 76, 77
EM Expectation Maximization. 37–40
EMST Euclidean Minimum Spanning Tree. 75
FGM Factorized Graph Matching. 39
FSNA Fuzzy Social Network Analysis. 31
GM Graph Matching. 8, 39, 91
GMM Gaussian Mixture Models. 37, 38, 62
GPD Graph-based Point Drift. 60–63, 65
GPS Global Positioning System. 22
ICP Iterative Closest Point. 4, 5, 11, 37, 40, 91, 97, 102
ML Machine Learning. 1, 2, 52
MRF Markov Random Field. 8
MST Minimum Spanning Tree. 8, 10, 16, 17, 70, 92–94
133
MSTME Minimum Spanning Tree of Maximum Entropy. 11, 91, 93–97, 100, 102
MWME Minimum-Weight Maximum-Entropy. 11, 76, 77, 88, 89, 100
NHDD Near Homogeneous Degree Distribution. xvii, xxi, 77–85, 88, 100
PGM Probabilistic Graphical Models. 8
RGB Red, Green, and Blue. 5
SNA Social Network Analysis. 9
SoF Sum-over-Forest. 28
SoP Sum-over-Path. 28
SVD Singular Value Decomposition. 29
SVM Support Vector Machines. 1, 7, 51
TSP Travel Salesman Problem. 15, 70
134
Notation
Class of Problems
• P is the class of problems which can be solved by a deterministic Turing machine
in polynomial time.
• NP is the class of problems which can be solved by a non-deterministic Turing
machine in polynomial time.
• O(.) refers to an upper bound on the growth of a function, i.e. the worst case
computational complexity of an algorithm according to the Big-O notation.
Transformations and Geometry
• R is a rotation matrix.
• t is a translation vector.
• A is an affine matrix.
• d(xi,xj) is the distance between two points xi and xj according to a metric function.
• de is the Euclidean distance.
• dm is the Manhattan distance.
• dc is the Chevycheb distance.
• |.| is the length of a vector or a graph, i.e. |X | = N .
• ||.||p is a p-norm.
Graphs and Point-sets
• X,Y are point-sets whose dimensions are N and M .
• X (V, E) is graph with vertices V and edges E . In case of vertices corresponding to
the point-set X, then X is a data-graph.
• xn is the nth point ∈ X , for 1 ≤ n ≤ N .
135
• ym is the mth point ∈ Y, for 1 ≤ m ≤M .
• Xt is a complete graph with t nodes and t(t− 1)/2 edges.
• X is the complement graph of a graph X , i.e. X = Xt −X .
• X 〈S〉 is a node induced subgraph of X (V, E), induced by a vector S, where |S| = |V|.
• X [U ] is an edge induced subgraph of X (V, E), induced by a vector U , where
|U | = |E|.
• X •N is a graph fulfilling the NHDD condition which is connected.
• X ◦N is a graph fulfilling the NHDD condition but it has one disconnected node.
Centrality
• v(xn) is a function v : X → R which stands for any centrality value of a node xn.
• Betwenneess
– b(xn) stands for the betweenness centrality of a node xn.
– B{X} = {b(xn)|xn ∈ X : 1 ≤ n ≤ N}.
– B{X} 6= is set of distinct values of B{X}.
• Closeness
– c(xn) stands for the closeness centrality of a node xn.
– C{X} = {c(xn)|xn ∈ X : 1 ≤ n ≤ N}.
– C{X} 6= is set of distinct values of C{X}.
• Degree
– d(xn) stands for the degree centrality of a node xn.
– D{X} = {d(xn)|xn ∈ X : 1 ≤ n ≤ N}.
– D{X} 6= is set of distinct values of D{X}.
• Eigenvector
– e(xn) stands for the eigenvector centrality of a node xn.
– E{X} = {e(xn)|xn ∈ X : 1 ≤ n ≤ N}.
– E{X} 6= is set of distinct values of E{X}.
136