Computational Analysis of Petroglyphs
DISSERTATION
submitted in partial fulfillment of the requirements for the degree of
Doktor der Technischen Wissenschaften
by
Markus Seidl
Registration Number 9255480
to the Faculty of Informatics
at the TU Wien
Advisor: Univ. Prof. Christian Breiteneder
The dissertation has been reviewed by:
Christian Breiteneder Andreas Uhl
Vienna, 18th August, 2016
Markus Seidl
Technische Universität Wien
A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at
Die approbierte Originalversion dieser 
Dissertation ist in der Hauptbibliothek der 
Technischen Universität Wien  aufgestellt und 
zugänglich. 
http://www.ub.tuwien.ac.at 
 
 
The approved original version of this thesis is 
available at the main library of the Vienna 
University of Technology.  
 
http://www.ub.tuwien.ac.at/eng 
 

Computational Analysis of Petroglyphs
DISSERTATION
zur Erlangung des akademischen Grades
Doktor der Technischen Wissenschaften
eingereicht von
Markus Seidl
Matrikelnummer 9255480
an der Fakultät für Informatik
der Technischen Universität Wien
Betreuung: Univ. Prof. Christian Breiteneder
Diese Dissertation haben begutachtet:
Christian Breiteneder Andreas Uhl
Wien, 18. August 2016
Markus Seidl
Technische Universität Wien
A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at

Declaration of Authorship
Markus Seidl
1020 Wien, Volkertplatz 4/20
I hereby declare that I have written this Doctoral Thesis independently, that I have
completely specified the utilized sources and resources and that I have definitely marked
all parts of the work - including tables, maps and figures - which belong to other works
or to the internet, literally or extracted, by referencing the source as borrowed.
Vienna, 18th August, 2016
Markus Seidl

Erklärung zur Verfassung der Arbeit
Markus Seidl
1020 Wien, Volkertplatz 4/20
Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen-
deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der
Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder
dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter
Angabe der Quelle als Entlehnung kenntlich gemacht habe.
Wien, 18. August 2016
Markus Seidl

Acknowlegdements
First of all I would like to thank my wife Barbara and my son Lenz, both of whom
supported me throughout my studies. Next, I want to thank my parents, my sister
Simone and my brother Raimund for their support.
I want to thank my supervisor Professor Christian Breiteneder and my second super-
visor Professor Axel Pinz for their continuous support, patience, motivation, insightful
comments and encouragement.
I want to express my gratitude to the archaeologists Dr Frederik Baker and Dr
Christopher Chippindale who let me participate in Cambridge University’s Prehistoric
Picture Project and introduced me to Valcamonica’s cultural treasures in a rainy week
in autumn 2010.
My sincere thanks go to all collaborators in the 3D-Pitoti project. First of all thanks
to our fabulous coordinators Professor Sue Cobb, Dr Mirabelle d’Cruz and Sally Shalloe
from the University of Nottingham. Tragically, Sally passed away unexpectedly and
could not see the great success of the 3D-Pitoti project she had been working on. Sally,
the memories of you are with me.
Special thanks go to Dr Alberto Marretta and Giovanna Bellandi from Archeoca-
muni, Tiziana Cittadini from the Centro Camuno di Studi Preistorici as well as Martin
Schaich and Dr Oliver Reuß from ArcTron 3D for providing materials for this thesis.
Furthermore, I want to thank Dr Christian Reinbacher and Georg Poier from Graz
University of Technology as well as Craig Alexander from Cambridge University for the
fruitful collaboration and joint publications.
In the 3D-Pitoti project I had the privilege to have Dr Matthias Zeppelzauer and
Ewald Wieser in my team at the St. Pölten University of Applied Sciences. Our weekly
discussions fostered the progress of the project and of my thesis. It was a pleasure to
supervise Ewald’s master thesis that contributed to the results presented in Chapter 8.
The research reported in this thesis has partially been carried out in the project
3D-Pitoti (http://www.3d-pitoti.eu/) which was funded from the European Com-
munity’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no.
600545; 2013-2016.
i
ii
Abstract
Numerous petroglyphs have been pecked, scratched and carved into rock surfaces in the
northern Italian valley Valcamonica. The classic documentation work carried out by
archaeologists is a massively time-consuming process. The rising availability of digital
images and 3D scans of petroglyphs facilitates digital workflows which can improve the
documentation process.
In this thesis, we aim at supporting the classic documentation pipeline for petro-
glyphs. The first step of the pipeline is the determination of the boundaries and spatial
locations of petroglyphs on a rock surface. This is usually done by time-consuming ma-
nual contact tracing. Then, the found figures are classified according to their shapes and
pecking styles. The large number of petroglyphs (Valcamonica contains up to 300.000
figures) demands large efforts for manual classification. The investigation of pecking
styles is often impossible based on the contact tracings and thus requires researchers to
return to the rocks frequently.
Following the classic documentation pipeline for petroglyphs, we propose and eva-
luate novel methods. To determine the positions and shapes of petroglyphs on a rock
panel we approach segmentation of 2D and 3D petroglyph images in pecked regions
and natural rock surface. Furthermore, we use 3D scans to investigate the similarity of
pecking styles, i.e. the shape, size, depth and spatial distribution of the peck marks a
figure consists of. Finally, we develop a petroglyph shape descriptor which allows the
classification of petroglyphs. Our tasks are challenging. The figures have been pecked
over thousands of years. The rocks are subject to weathering and abrasion. Therefore,
the visual and tactile appearance of the petroglyphs varies greatly. Figures have often
been superimposed over existing figures. Consequently, many merged and partial figures
exist.
Contrary to previous work by others, we show that the segmentation of 2D images
of rock surfaces is feasible. The employment of illumination-independent high-resolution
3D data of the surfaces’ micro-topographies clearly improves results. We facilitate the
investigation of pecking styles by modeling their similarity with 3D surface descriptors.
The shape classification of a dataset containing more than thousand petroglyphs yields
very good results with a combination of skeleton-, boundary-, and region-based shape
descriptors. Our results can be useful for rock art researchers. Furthermore, we suggest
how they can be applied in other domains.
iii
iv
Kurzfassung
Im norditalienischen Tal Valcamonica existieren zahlreiche Felsbilder, die in freiliegende
Steinoberflächen gemeißelt worden sind. Die archäologische Dokumentation der Fels-
bilder (Petroglyphen) ist sehr zeitaufwändig. In den letzten Jahren wurden verstärkt
Digitalphotographie und 3D Aufnahmetechnik zur Dokumentation der Petroglyphen
eingesetzt. Die dabei entstehenden Bilddaten ermöglichen digitale Abläufe, welche wie-
derum den archäologischen Dokumentationsprozess verbessern können.
Die vorliegende Dissertation hat zum Ziel, Grundlagen für einen digitalen Doku-
mentationsprozess von Petroglyphen zu schaffen. Der erste Schritt der Dokumentation
ist die Erfassung der Positionen der einzelnen Petroglyphen auf den Steinoberflächen.
Dafür werden normalerweise die Petroglyphen mit einem Filzstift auf eine auf den Stein
gelegte transparente Folie übertragen. In weiterer Folge werden die so aufgezeichneten
Figuren nach ihren Formen klassifiziert. Nachdem alleine im Valcamonica bis zu 300.000
Petroglyphen existieren, ist die Klassifikation ein sehr arbeitsaufwändiger Prozess. Ne-
ben der Form ist die Machart der Gravuren (Form, Größe, Tiefe und Verteilung der
einzelnen Meißelzeichen) ein wichtiges Merkmal der Petroglyphen. Der Gravurstil kann
oft anhand der Folien alleine nicht beurteilt werden und erfordert so erneute Reisen zu
den Steinoberflächen.
Wir entwickeln und evaluieren automatisierte Methoden für den Dokumentations-
prozess von gemeißelten Felsbildern. Um das zeitaufwendige Durchpausen zu ersetzen,
setzen wir neue Methoden zur Segmentierung von Petroglyphen aus Digitalbildern und
3D Aufnahmen ein. Wir verwenden die 3D Aufnahmen auch dazu, die Ähnlichkeit von
verschiedenen Gravurstilen basierend auf 3D Deskriptoren der Oberfläche zu messen.
Darüber hinaus entwickeln wir einen Formdeskriptor für Petroglyphen, der für die Klas-
sifikation eingesetzt werden kann. Die Aufgaben sind sehr herausfordernd, da die Fels-
bilder über mehrere Jahrtausende hinweg in den Fels geschlagen wurden und die Fel-
soberflächen verschieden stark verwittert sind. Viele Petroglyphen sind außerdem über
bereits bestehende graviert worden, daher gibt es viele Figuren, die nur in Teilen exis-
tieren.
Im Gegensatz zu früheren Arbeiten von anderen können wir zeigen, dass die au-
tomatische Segmentierung von Felsbildern basierend auf Digitalphotographien möglich
ist. Die Verwendung der 3D Aufnahmen verbessert die Segmentierungsqualität, da die
3D Aufnahmen die Oberflächentextur beleuchtungsunabhängig erfassen können. Der
Einsatz von 3D Oberflächendeskriptoren ermöglicht die automatische Unterscheidung
v
verschiedener Gravurstile. Wir evaluieren unseren Formdeskriptor auf einem Datensatz
mit mehr als tausend Petroglyphen und erreichen sehr gute Klassifikationsergebnisse.
Der Nutzen unserer Ergebnisse für Archäologen hat sich im Laufe der Evaluierungen
gezeigt. Wir beschreiben darüber hinaus, wie unsere Ergebnisse in anderen Anwen-
dungsbereichen sinnvoll eingesetzt werden können.
vi
Contents
1 Introduction 1
1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Material 9
2.1 Rock Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Rock Art Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Documentation methods for Petroglyphs . . . . . . . . . . . . . . 13
2.3 Valcamonica’s Little Puppets - The Pitoti . . . . . . . . . . . . . . . . . 15
2.3.1 (Re-)Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Chronology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 2D images of Petroglyphs . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 3D scans of Petroglyphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.1 Segmentation Annotation . . . . . . . . . . . . . . . . . . . . . . 26
2.6.2 Shape Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6.3 Shape Annotation Tool . . . . . . . . . . . . . . . . . . . . . . . 29
2.6.3.1 Main Features . . . . . . . . . . . . . . . . . . . . . . . 29
2.6.3.2 Handling and Annotation Workflow . . . . . . . . . . . 31
2.6.3.3 Statistical Reporting . . . . . . . . . . . . . . . . . . . . 34
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Petroglyph-related Work 41
3.1 3D Scanning of Petroglyphs . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Segmentation of Petroglyph Images . . . . . . . . . . . . . . . . . . . . . 44
3.3 Pecking Style Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Shape Classification of Petroglyphs . . . . . . . . . . . . . . . . . . . . . 45
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 Segmentation of Petroglyph Images 49
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.1 Pixel Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 51
vii
CONTENTS
4.2.2 Fusion Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.2 Visual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.3 Training and Classification . . . . . . . . . . . . . . . . . . . . . 56
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.1 Single Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.2 Early Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.3 Late Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.4 Late Fusion with Thresholded Majority Voting . . . . . . . . . . 60
4.4.5 Interactive Late Fusion . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 Segmentation of Petroglyph 3D Point Clouds 67
5.1 3D Descriptor-based Point Cloud Segmentation . . . . . . . . . . . . . . 68
5.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Image-space-based Point Cloud Segmentation . . . . . . . . . . . . . . . 71
5.2.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.2 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.5.1 Small dataset . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.5.2 Full dataset . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Similarity of Pecking Styles 85
6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.1 Dense sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.2 Topology-based sampling . . . . . . . . . . . . . . . . . . . . . . 89
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.1 Dense sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.2 Topology-based sampling . . . . . . . . . . . . . . . . . . . . . . 95
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7 Description of Petroglyph Shapes 103
7.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.1.1 Region-based Descriptors . . . . . . . . . . . . . . . . . . . . . . 104
7.1.2 Contour-based Descriptors . . . . . . . . . . . . . . . . . . . . . . 105
7.1.3 Skeleton-based Descriptors . . . . . . . . . . . . . . . . . . . . . . 105
7.1.4 Discussion of Shape Descriptors . . . . . . . . . . . . . . . . . . . 107
viii
CONTENTS
7.1.5 Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.2.1 Benchmark Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.2.2 Overview of our Approach . . . . . . . . . . . . . . . . . . . . . . 108
7.2.3 Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.4 Descriptor Fusion and Classification . . . . . . . . . . . . . . . . 111
7.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.3.1 Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.3.2 Comparison with Baseline Descriptors and Descriptor Fusion . . 114
7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8 Skeletonisation of Petroglyph Shapes 119
8.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.1.1 Point-based pruning approaches . . . . . . . . . . . . . . . . . . . 122
8.1.2 Branch-based pruning approaches . . . . . . . . . . . . . . . . . . 122
8.1.3 Comparison of algorithms . . . . . . . . . . . . . . . . . . . . . . 123
8.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.2.1 Investigated material . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.2.2 Adaptive shape pre-processing . . . . . . . . . . . . . . . . . . . . 126
8.2.3 Improvements of existing skeletonisation methods . . . . . . . . . 128
8.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.3.1 Selection of parameters . . . . . . . . . . . . . . . . . . . . . . . 131
8.3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9 Automated Classification of Petroglyph Shapes 139
9.1 Approach and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.2 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.2.1 Descriptor Combination Previously Proposed . . . . . . . . . . . 141
9.2.1.1 Modified Sansoni Typology . . . . . . . . . . . . . . . . 141
9.2.1.2 3D-Pitoti Project Typology . . . . . . . . . . . . . . . . 145
9.2.2 Details on Single Shape Descriptors and CNNs . . . . . . . . . . 149
9.3 Use Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
10 Conclusion and Future Work 155
10.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
10.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
List of Figures 159
List of Tables 167
ix
CONTENTS
References 169
Appendices 185
A 3D dataset 187
B Peer-reviewed Publications 211
x
Introduction
Figure 1.1: A petroglyph depicting a warrior in Valcamonica, Italy.
Petroglyphs have been pecked, scratched and carved into rock panels all over the
world since thousands or even ten thousands of years. They are among the oldest
cultural traces of mankind and allow us a to have a glimpse as far as to the early
stages of human civilisation. Although subject to weathering and abrasion of the rock
surfaces they are more stable and longer lasting than painted rock art (pictographs).
The petroglyphs are not made of material but exist because of material that has been
pecked or scratched away from a rock surface. One consequence is that they cannot
be dated objectively by Carbon dating. The second consequence is that, as they are
existing at the surface of a rock, they are flat (or slightly curved with the curvature of
1
1. INTRODUCTION
the rock). This flatness led to a graphical style that combines different perspectives. If
we look at the warrior in Figure 1.1, we notice that some body parts have been rotated
in order to be depicted and recognised. The feet are pointing to the right while the
whole figure with equally long left and right arms and legs suggests a depiction in frontal
perspective.
Petroglyphs stem from the past and are evidence of mankind. The archaeologists
researching different aspects of petroglyphs are rock art researchers. To be able to infer
knowledge from evidence stemming from the past, the evidence must be documented,
sorted, archived and preserved for future researchers. The classic documentation method
for petroglyphs is time-consuming manual contact tracing on transparent sheets (see
Figure 1.2). The result of such a tracing is depicted in Figure 1.3. Digital photography
and image stitching methods already moved forward the documentation of rock art.
With the rising availability of affordable 3D scanning techniques, more and more rock
surfaces containing petroglyphs will be scanned and subsequently be available digitally
as highly detailed 3D scans. The availability of 3D scans of rock art is not necessarily
an immediate advantage for the rock art researcher and the interested public. In order
to exploit the potential, the 3D data must be acquired with geo-referencing, must be
stored safely, must include tools for navigation and allow a viewing experience as good
as the viewing experience directly on site. Moreover, the data should also improve the
work of rock art researchers and allow new insights which have not been possible before.
The main aim of this thesis is to investigate whether digital 2D and 3D depictions of
petroglyphs and their (semi-)automated semantic enrichment can enable new methods
for rock art documentation and analysis.
With the vision of large archives full of rock art images and 3D scans we aim at
robust automated methods to
(a) determine the exact shapes and spatial positions of petroglyphs in images or 3D
scans of full rock panels (i.e. segmentation of the image or 3D scan in pecked and
non-pecked regions);
(b) classify the petroglyphs regarding their shapes and pecking styles and
(c) retrieve similar petroglyphs from different archives of petroglyph images and 3D
scans.
We have access to high-resolution photographs and high-resolution 3D scans of rock
surfaces. Furthermore, we have many classical petroglyph tracings of full rock panels.
The material stems from the UNESCO world heritage site Valcamonica in northern
Italy.
The motivation to bring computer vision methods to the domain of rock art research
is twofold. First, to address human solvable problems that occur in high number and
are therefore repetitive. Second, to solve problems, that are hard to solve for humans,
e.g. because of visibility and/or scale. We clearly have the first type of problems in our
case: Our test-bed Valcamonica contains 300.000 petroglyphs. The documentation of
the shape and the spatial location of all figures is a massively time-consuming process
that is currently far from being completed. The subsequent step of classification of
2
Figure 1.2: Manual contact tracing in Valcamonica, Italy. c Alberto Marretta, used with
permission.
3
1. INTRODUCTION
Figure 1.3: Manual contact tracing from Seradina Rock II in Valcamonica, Italy.2
the figures into archaeological typologies is open for most of the rocks. Hence, the
automated segmentation of rock surfaces in pecked and natural areas as well as the
automated classification of the segmented figures has the potential to clearly enhance
the documentation and classification work of rock art researchers. The second type of
problem occurs in connection with the pecking styles, i.e. the shape, depth and spatial
distribution of the single peck marks a petroglyph consists of. The high-resolution 3D
scans with a precision of 0.1mm can enable a very close look on single peck marks
and may allow an automated similarity estimation of differently pecked regions. The
knowledge about the similarity of pecking styles in different regions can help in relative
chronology. In the case of two merged figures usually one figure is superimposed over
the other. If the pecking styles of the two figures can be distinguished, the rock art
researcher can infer that the figure superimposed over the other has been pecked later.
The research for this thesis was partly carried out in the 3D-Pitoti project3. The
project was realised in an international consortium with seven partners. It was dealing
with a completely 3D-based digital rock art pipeline from scanning over intelligent pro-
cessing to interactive visualisation (see Figure 1.4). The test-bed for the tools developed
in the project is the UNESCO world heritage site Valcamonica.
1.1 Contribution
This thesis aims to lay a fundament for a digital toolset to analyse petroglyphs based on
high resolution 3D scans. However, our contributions have impact outside the rock art
domain, too. In the following paragraphs we line out our main contributions and their
impact beyond the scope of the thesis. The scientific publications based on materials
2 c Alberto Marretta, used with permission.
3This project was funded from the European Community’s Seventh Framework Programme
(FP7/2007-2013) under grant agreement no 600545; 2013-2016. For more information about the project
please visit http://3d-pitoti.eu.
4
1.1 Contribution
Scanning	
	
Database	
	
Postgre
SQL	
Intelligent	Processing	
	
SFM	
Segmenta1on	
Feature	Analysis	
and	Classifica1on	
Interac2ve	Visualisa2on	
	
Point-	
cloud	
(rgba)	Run-1me	
Data	Exchange	
Real-Time	
Rendering	
Parame
triza-
1on	
Meta-
data	
User	Input	
Texture	
3D-	
meshes	
LOD	
struct-
ures	
LOD	
struct-	
ures	
Photos		
with	
Meta	
data	
Point-
cloud	
(rgb)	
Texture	
3D-
meshes	
Data	Export	
Import	
3D/
2.5D	
Mask	 Meta	
data	
Data	
Import	
Preprocessing	
Figure 1.4: Simplified system workflow of the complete 3D-Pitoti pipeline. The scanning
was carried out as multi-scale approach based on the 3D-reconstruction method Structure-
from-Motion (SFM). The massive amount of multi-scale 3D-data required Level-of-Detail
(LOD) algorithms for interactive visualisation. The contributions of this thesis (see Sec-
tion 1.1) are subsumable under “Intelligent Processing”. We thank Alex Kulik for the
preparation of the first version of this figure.
5
1. INTRODUCTION
of this thesis are listed in Appendix B. We can provide all data used in this thesis upon
request.
Segmentation: For the time-consuming first step of rock art documentation - trac-
ing - we propose digital tracing. For a task that has even been considered infeasible
by other researchers ([ZWKL11]) we propose an approach that segments a point cloud
(or image) in foreground (pecked rock surface) and background (natural rock surface).
First, we demonstrate that segmentation of petroglyphs can be performed with high
quality digital photos as input. The main disadvantage of using photos for segmenta-
tion is the dependency on the lighting situation during acquisition. To be independent
of the lighting situation we use high resolution 3D scans of the rock surfaces in the
next step. We demonstrate that segmentation based on 3D scans outperforms segmen-
tation of petroglyph photos. But, the dense extraction of 3D point cloud descriptors
is computationally demanding. Thus, in a next step we move the problem to image
space by using the depth map of the rock surface as input. We show, that the depth
map-based method yields the best results. Our approach and its evaluation could have
impact outside the rock art domain as the availability of high-resolution 3D surface
scans will probably increase strongly and texture classification based solely on the 2D
acquired visual appearance of a surface might be outperformed or even replaced by tex-
ture classification based on 3D data, i.e. the lighting independent tactile appearance of
a surface.
Classification: Subsequent to the tracing of the petroglyphs on the rock surface the
single petroglyph figures are classified into pre-defined typologies. We provide a petro-
glyph shape similarity measure based on the combination of a) a novel descriptor based
on the skeletal graph of a petroglyph, b) shape descriptors used by other researchers for
petroglyph shape classification and c) state-of-the-art image features based on convolu-
tional neural networks. We evaluate our approach on a large shape database containing
more than thousand petroglyphs. Our approach outperforms all other approaches. We
provide the evaluation data to the community for further investigations. Our approach
and evaluations have impact outside the rock art domain as we on one hand investigate
the question “Who needs shape descriptors in the age of convolutional neural networks
where features are simply learned?” and on the other hand existing datasets for shape
classification are highly artificial compared to our shape data created by humans.
3D-Data: Most of the tasks described in this thesis are based on high-resolution 3D
data. But, the reader will recognise that we transferred tasks to 2D image space. Thus,
this thesis implicitly investigates the general question “How to handle the computational
analysis of large amounts of high-resolution 3D data?”.
1.2 Organisation
The organisation of the thesis is as follows. Subsequent to the introduction, we ex-
tensively describe our material in Chapter 2. We define rock art in Section 2.1 and
introduce classic rock art documentation in Section 2.2. In Sections 2.3 to 2.6 we de-
scribe our material and the ground truth acquisition process. The related work is split
6
1.2 Organisation
in related work dealing with petroglyphs which is presented in Chapter 3 and technical
related work which is - with the aim of better readability - summarized in each technical
Chapter. Chapters 4, 5 and 6 deal with the characteristics of the micro-topography of
the rock surface. Chapter 4 presents an approach for the segmentation of rock surfaces
in pecked and non-pecked regions based on digital photos. Chapter 5 deals with the
same problem but uses 3D point clouds as input. Chapter 6 describes investigations on
the similarity of different pecking styles. Chapters 7, 8 and 9 deal with the shape of
petroglyphs. Chapter 7 presents an approach to describe and match petroglyph shapes
based on their skeletal graph. The quality of the skeletal graph is depending on the
quality of the skeletonisation. Hence, in Chapter 8 we investigate the skeletonisation
and other necessary pre-processing steps for shape description of petroglyphs. In Chap-
ter 9 we evaluate the findings of the previous chapters on a large petroglyph shape
dataset and compare and combine our descriptor with other shape descriptors and an
additional state-of-the-art image matching method. Finally, in Chapter 10 we reflect,
draw conclusions and line out future work.
7
1. INTRODUCTION
8
Material
Figure 2.1: A petroglyph scene in Valcamonica, Italy.
In this chapter we describe the origin of our material and our ground truth data.
Section 2.1 contains the definition of rock art, its world wide occurrences and some
prominent examples. Section 2.2 describes documentation issues of rock art focusing on
the documentation of petroglyphs. Section 2.3 presents the rock art from Valcamonica.
Sections 2.4 to 2.6 describe our ground truth data and the annotation process.
9
2. MATERIAL
2.1 Rock Art
Withley defines rock art as follows [Whi05]:
Rock art is landscape art. It consists of pictures, motifs, and designs placed
on natural surfaces, such as cliff and boulder faces, cave walls and ceilings,
and the ground surface. Rock art is also sometimes referred to as cave art
or parietal (wall) art. Regardless of designation, the defining characteristic
of rock art is its placement on natural rock surfaces.
Rock art is a global phenomenon. It can be found all over the world. Many rock art
sites are part of the UNESCO world heritage. The UNESCO ICOMOS documentation
centre lists rock art sites in Africa, the Arabian States, Asia and the Pacific, Europe as
well as Latin America and the Caribean [Sil09]. In total, 31 rock art sites are a part of
the UNESCO world heritage. Rock art belongs to the oldest traces of human culture.
Pike et al. dated rock art from 11 caves in Spain based on uranium-series disequilibrium
dates of calcite deposits overlying or underlying art [PHGD+12]. The oldest piece of
rock art they dated is at least 40.800 years old. Also outside of Europe, old rock art
has been found and dated. Aubert et al. dated cave art in Sulawesi, Indonesia using
uranium-series dating of coralloid speleothems [ABR+14]. They found a hand stencil
with a minimum age of 39.900 years.
The Lascaux cave in France contains a well-known example of European rock art.
The paintings in the cave are dated around 15.000 BC. Hence, they are at least 17.000
years old. They show animals that the Paleolithic humans had in their surroundings,
both animals that were hunted and eaten, such as deer and bison as well as animals that
were feared, such as bears and wolves. The animals are shown in a twisted perspective,
a combination of front view and profile which allows the depiction of the imposant
profile of a bison with both horns (see Figure 2.2). The images have been painted
with pigments of minerals readily available. However, no brushes have been found.
Most probably the black outlines have been painted with moss or hair or even the raw
pigments. The surfaces could have been covered with pigments blown to the surface,
as hollowed-out bones internally covered with pigments have been found in the caves
[Ted00]. Generally, rock art can be subsumed under three basic types [Whi05]:
• Petroglyphs are pecked or scratched into rock surfaces (see Figures 2.1 and 2.3).
The peckings are sometimes called engravings or carvings. They are made with
fist-sized stones that are used as hammerstones. A stone of a certain type (e.g.
basalt) can be used as hammerstone on the same type of rock. Another way
of making petroglyphs is to use a fist-sized stone or a metal tool as chisel on
which one hammers either with another stone or with a piece of wood. The rarely
occurring scratched petroglyphs are made with a lithic flake or blade.
• Pictographs are drawn or painted onto rock surfaces (see Figure 2.2). They are
made of common materials in the whole world. The materials are minerals and
other natural substances. The often used red is made of ground ocher, black is
10
2.1 Rock Art
Figure 2.2: Cave painting in the Lascaux cave in France showing superimpositions of two
bison bulls, horses, deer and other animals.2
mostly made of charcoal. The materials are either used dry (e.g. charcoal to draw
on rock) or ground and mixed with water and then put wet on the rock surface.
• Earth figures are large designs or motifs that are on the ground surface. They
are divided in two categories depending on the type of creation. Motifs like the
Nazca lines in Peru (see Figure 2.4) that are made by removing the surface of
desert ground and creating a negative image are called Intaglios. Forms that are
created as positive images on the ground by moving rocks are called geoglyphs.
The main concern regarding rock art is its ephemeral nature. Unless it is in a cave,
it is exposed to nature and thus subject to weathering. Depending on the rock they
are pecked into, Petroglyphs are relatively insensitive to the environmental conditions.
However, pictographs (painted rock art) can be washed away within one rainy season.
Hence, painted rock art is mostly found in caves or in dry regions. Thus, a central
point, the step zero of rock art research, is documentation of rock art.
2 c Prof Saxx, CC-BY-SA-3.0, source: https://en.wikipedia.org/wiki/Lascaux#/media/File:
Lascaux_painting.jpg.
11
2. MATERIAL
Figure 2.3: A petroglyph depicting a Bubalus in the Sahara in southern Algeria. c Linus
Wolf, CC-BY-SA-3.0.
12
2.2 Rock Art Documentation
Figure 2.4: Intaglio showing a hummingbird in Nazca, Peru. The figure is almost 100
meters long.3
2.2 Rock Art Documentation
Basically, rock art is documented with pictures, e.g. with (multispectral) images of
pictographs or aerial images of geoglyphs. Petroglyphs have a third dimension which
is of interest. How are the pecks or scratches formed? How deep are they? Which
tool could have been used? Different techniques to document petroglyphs have been
developed to answer these questions.
2.2.1 Documentation methods for Petroglyphs
Rock art researchers spend a considerable amount of time to document the rock engrav-
ings. The first technique developed for the documentation of petroglyphs was frottage
(engl. rubbing). A sheet of paper is placed on the petroglyphs and a dyeing material
(e.g. grass, charcoal or graphite) is rubbed onto the surface of the paper. The edges
on the rock surface create different intensities of color. See Figure 2.6 for the process of
rubbing and Figure 2.5 for a resulting image of a whole panel.
Another technique is casting with plaster which results in a negative mask of the
rock surface and therefore delivers a very precise depiction of the three dimensional
surface. But, it is obviously a very time-consuming technique. The most frequently
used technique since the 1950ies is manual tracing. A transparent foil is put on the rock
surface and each peck mark is traced with a marker pen. Tracing delivers - as long as
the tracing person is concentrated - a very precise depiction of the structure of the rock
art and the characteristic of the pecking - the pecking style. Figures 2.7 and 2.8 show
3 c Bjarte Sorenson, CC-BY-SA-3.0, source: http://commons.wikimedia.org/wiki/File:Nazca_
colibri.jpg.
13
2. MATERIAL
Figure 2.5: Result of a rubbing documentation. The rubbing shows a rock panel in
Tanum, Sweden. c Gerhard Milstreu / Tanums Hällristningsmuseum, used with permis-
sion.
14
2.3 Valcamonica’s Little Puppets - The Pitoti
Figure 2.6: The process of rubbing on a rock panel in Tanum, Sweden.4
the tracing process while Figure 1.3 on page 4 shows the result of such a tracing.
Besides photography, which has always been used to document petroglyphs and
digital photography, which extended the possibilities about 20 years ago, 3D scanning
of petroglyphs has been performed since around 10 years ago. The results that can be
obtained with 3D scanning can be seen as a combination of classic photography with
the documentation techniques mentioned above to record the pecks and scratches in
their three-dimensionality. We describe the 3D scanning of petroglyphs in more detail
in Chapter 3.
2.3 Valcamonica’s Little Puppets - The Pitoti
More than 300.000 “prehistoric” figures have the immediacy and the his-
torical reality of documents created by those who participated in the very
events which led to the formation of European civilization, and which were
left, not as chronicles for posterity, but as an organic function of their lives.
In this way they act as a medium through which we can revive the past.
Emanuel Anati [AC94]
The Valle Camonica or, more frequently used, Valcamonica is an alpine valley in Lom-
bardy (a northern Italian province) stretching north from the Lago d’Iseo and termi-
nating at Passo Tonale. It is a UNESCO world heritage site with numerous rock panels
containing more than 300.000 petroglyphs stemming from the last 8000 years. Most of
the petroglyphs originate from prehistoric times. They have been out of focus outside
the valley until the 1930ies, when first (re-)discoveries of pecked surfaces where made.
In 1979 the Valcamonica rock art was included in the UNESCO world heritage list. In
4 c Attie Logger / Tanums Hällristningsmuseum, used with permission.
15
2. MATERIAL
Figure 2.7: Tracing on a rock in Valcamonica in Italy.5
1994 Anati published the book “Valcamonica Rock Art - A New History of Europe”
[AC94] on which the following sections are based unless otherwise noted.
2.3.1 (Re-)Discovery
The Romans conquered Valcamonica in the first century AD. Subsequently, most of the
pre-Roman settlements and with them the petroglyphs were abandoned and forgotten.
Over 2000 years, moss and earth grew over most of the rock panels. Only a few rock
panels were still at least partly visible in the 19th century. One of these was known as
Preda die Pitoti, the puppets’ rock.
In 1930, two scholars from the University of Florence and the University of Turin
visited the puppets’ rock in Capo di Ponte separately. Both published on this one
rock including photographs and descriptions of the engravings. Following these pub-
lications, in the 1930ies, several more rock panels with petroglyphs have been found
and documented, among these inscriptions in an unknown language using Northern
Etruscan characters. Franz Altheim, a Professor of history at the University of Berlin
and supporter of Hitler’s racist theories came to the valley to find traces of the Aryan
civilizations. He published a work “Vom Ursprung der Runen” (1933) which has a pref-
5 c Alberto Marretta, used with permission.
16
2.3 Valcamonica’s Little Puppets - The Pitoti
ace by Heinrich Himmler. In WW II and the subsequent years, only little research was
carried out in Valcamonica. In the 1950ies, Gualtiero Laeng and Emanuele Suss set out
to search for rock panels with engravings. Suss made the first map of rock panels of
a central area (the Naquane area) containing around 100 rock panels with engravings.
This site was then almost completely documented - only six more rocks have been found
in this area since then. In the rest of the valley more than 2500 rock panels have been
found from 1956 to 1994. The “Anati Expedition” discovered 30.000 Pitoti from 1956
to 1964, when the Centro Camuni Studi Preistorici (CCSP) has been founded by Em-
manuel Anati. The CCSP carried on the excavation and documentation of Valcamonica
petroglyphs, which are called “Pitoti” (plural, singular is “Pitoto”) a local dialect word
for “little puppet”.
2.3.2 Chronology
The variation of figures found in Valcamonica is high (see Figure 2.15 on page 28).
Some figures try to depict reality as close as possible, some are highly stylized, while
others are abstract. The work techniques used to make the petroglyphs vary, too.
Most petroglyphs have been pecked rather than scratched or carved. The process of
pecking was performed directly with a stone in the fist or indirectly with a chisel and a
hammer. The chisel was made of stone or metal while another stone or a piece of wood
was used as hammer. A basic problem of all attempts to rock art chronology is the
fact, that no dating method like the investigation of the employed painting materials
or C14 analysis can be used. Hence, each chronology has the risk of being questioned.
Marretta [Mar11] describes the dating approach as a two step process. First, a relative
chronology between two figures is established by the analysis of superimposition which
exploits the fact that a figure pecked over another figure must have been produced later.
Second, an absolute chronology is derived by the identification of similarities between
the figures and corresponding real objects, e.g. weapons, plows, wagons, etc. that
have been found and dated absolutely with stratigraphic methods and C14 analysis.
Additionally, the consideration of extinct animals and the knowledge about the coming
to existence of certain objects (e.g. the plow in the third millenium BC) can support
the dating process.
In [AC94] Anati argues, that seven properties that are unique in Valcamonica com-
pared to other rock art sites enabled the chronology.
1. The high number of petroglyphs that have been found in Valcamonica delivers
sufficient data.
2. The high amount of superimposition, i.e. figures that have been pecked over each
other, help to establish relative chronologies between the superimposed figures as
it can be seen in most of the cases which figure was pecked over the other, i.e.
pecked later.
3. The large number of depicted weapons and other objects with a shape distinct
enough to be matched with real and hence C14 datable objects found in excava-
tions enables absolute chronological cues.
17
2. MATERIAL
Figure 2.8: Tracing on a rock in Valcamonica in Italy. c Alberto Marretta, used with
permission.
18
2.3 Valcamonica’s Little Puppets - The Pitoti
4. Variations in style allow to group figures.
5. Excavated human settlements containing rocks with engravings allow to date the
rocks through stratigraphy.
6. Inscripted Etruscan letters indicate time frames in which the letters must have
been pecked.
7. Animal figures of extinct species give an absolute cue of the last possible date
when the figure must have been pecked.
The chronological analysis based on the seven criteria led to the first chronology by
Anati [Ana76]. The chronology consists of six principle periods. First, a Proto-
Camunian period contains a hunter-gatherer society. The four subsequent periods cover
the actual Camunian civilization, a food production lifestyle. The sixth period is the
Post-Camunian period containing Roman and Post-Roman time.
19
2. MATERIAL
2.4 2D images of Petroglyphs
In the previous sections, we have presented general information about our material. In
the remaining sections of this chapter we will focus on the data we use for our tasks
segmentation of rock surfaces containing petroglyphs and classification of the resulting
figures. The first step in documentation of petroglyphs is the recording of the spatial
locations of the petroglyphs on a rock surface. The classic documentation methods
(rubbing, casting and tracing, see Section 2.2) rely on labor intensive work directly on
the rock surface. For our initial investigations on segmentation of petroglyph depictions
in natural rock surface and pecked rock surface we worked with high quality digital
photography. The pictures have been taken with a Canon EOS 5 MK II and different
high quality original Canon lenses. Figures 2.9(a) and 2.10 show parts of a pecked
surface. The pictures have sufficient resolution to show detailed depictions of the local
surface (see enlarged crop in Figure 2.9(b)).
However, in practice there is one central obstacle of working with 2D photographs
that strongly influences the quality of the results: The dependence on the lighting
situation. Figure 2.9(a) shows an example with shadows from trees. Additionally to
the problem of other objects casting shadow on the surface, the incoming light direction
strongly influences how the three-dimensional micro-structure of a surface is depicted
on a photograph. On an overcast day with ambient lighting the surface structure is less
visible than on a sunny day in the morning or in the afternoon, when the sun rays are
close to parallel to the surface. Near parallel illumination is beneficial for image-based
2D approaches as it generates stronger shadows and thus makes the surface geometry
visible better. This problem can only be overcome by the use of a flash light. A flash
device mounted on top of the camera is, however, not suitable as the light rays of the
flash would be close to orthogonal to the surface and, hence, reduce visibility of the
geometric micro-structure of the surface. The application of a mobile studio flash light
improves the situation, as the flash can be located in an adequate angle to the surface.
The problem that remains is the high power consumption of the flash which requires
carrying large and heavy battery packs to the capture site. This is often not feasible in
remote areas and on steep surfaces. Therefore, we captured our 2D material without a
flash light.
2.5 3D scans of Petroglyphs
In the previous section we described our 2D photographic material and its main dis-
advantage - the lighting dependency during acquisition. An alternative way to capture
the micro-topography of the surface is 3D scanning, which is frequently used for archae-
ological documentation.
We work with a dataset acquired by the german 3D-surveying specialist ARCTRON1
for the 3D-Pitoti project. The dataset contains 26 surface reconstructions with a res-
olution of 0.1mm. The number of points per surface ranges from 810k to 10.7M. An
1
http://www.arctron.de
20
2.5 3D scans of Petroglyphs
(a) Full image.
(b) Detail. We observe, that the peck marks can be seen well.
Figure 2.9: High quality 2D images as used in 2D segmentation. Shadows of trees
surrounding the rock are visible.
21
2. MATERIAL
Figure 2.10: A high quality 2D image as used in 2D segmentation.
22
2.5 3D scans of Petroglyphs
(a) Point cloud.
(b) Detail.
Figure 2.11: A 3D input point cloud (a) and a close up of the head of the Pitoto (b).
The point cloud has 4.4M points. The close up clearly shows the surface morphology of
the pecked regions (e.g. in the head and torso of the Pitoto).
23
2. MATERIAL
example for a point cloud with 4.4M points together with a close-up zoom into the cloud
is shown in Figure 2.11. See Appendix A for details on the 3D dataset.
From the input point clouds, we computed a number of derived representations
suitable for different purposes. The derived representations include:
• Exports of the point clouds in the XYZRGB1 format,
• Orthophotos of the point clouds with oblique lighting,
• Depth maps registered with the orthophotos,
• Parameters of the orthographic transform.
The XYZRGB format has been selected due to its simple and easy to process format.
Each point is stored together with its color values in one row of a plain text file. The
orthographic projection (together with the corresponding depth map) of the surface
gives us an easy processable representation of the 3D surface. Since the surface scans do
not contain self-occlusions or overhanging structures, there are no occluded areas in the
orthographic projection. Thus, no information available in the 3D scan gets occluded in
the projection. Consequently, the orthographic projection is an adequate representation
of the 3D scan. For the orthographic projection, we take an input cloud and fit a median
plane through the cloud. Once the median plane is found the projection direction is
set to the normal direction of the median plane. The result of orthographic projection
is a projection of the point cloud to a 2D image (orthophoto) that is parallel to the
median plane. The process for depth maps is in general the same as for the generation
of orthophotos. The only difference in computation is that instead of mapping the color
of the point at the projected pixel position in the orthophoto, the distance of the point
from the median plane is mapped to the pixel position. The resulting depth maps are
stored as 32bit TIFF images to preserve the full depth precision provided by the point
cloud.
The point clouds have a resolution of 0.1mm. They are stored with metric coor-
dinates. The orthographic projection is performed using the software Aspect3D from
ARCTRON2. The image resolution is set to 300dpi (equivalent to 0.08 pixels per mm),
which has shown to be sufficient for the generation of accurate orthophotos and depth
maps.
2.6 Ground Truth
In order to be able to quantitatively evaluate our methods we need to enrich our data
with expert knowledge about it. This gold standard is called the ground truth. We com-
pare the output of our algorithms with the ground truth to evaluate the algorithms’
performance. For the methods that aim at segmentation in natural rock and pecked
surface areas we need annotations of the surface containing information whether a spe-
cific pixel or 3D point is considered foreground or background (see Subsection 2.6.1).
1The XYZRGB format stores a sextuple for each 3D point containing the spatial coordinates (x,y,z)
and RGB color values (r,g,b).
2
http://aspect.arctron.de/
24
2.6 Ground Truth
(a) Point cloud.
(b) Ortho photo. (c) Ortho-depth photo.
Figure 2.12: An input point cloud (a) with its orthophoto (b) and depth map (c). Brighter
values in the depth map indicate higher depth values.
25
2. MATERIAL
Figure 2.13: Segmentation ground truth annotation on the 2D image in Photoshop.
For the ground truth on the types of the petroglyphs we need a much larger number of
examples than the few figures depicted in detail in the 26 3D scans. We use classic trac-
ings of Valcamonica petroglyphs for this purpose and have a database containing more
than 1000 expert-annotated figures in two different typologies (see Subsection 2.6.2).
To establish this database we developed a user friendly web-based tool (see Subsection
2.6.3).
2.6.1 Segmentation Annotation
For the evaluation of the segmentation algorithms, we asked domain experts to create
manual segmentations that provide a ground truth for the evaluation of the performance
of the developed algorithms. All pixels in an orthophoto being part of a pecked region
were annotated. The result is a binary mask which is true for pecked regions and false
for natural rock surface.
The annotation was performed by Archaeology experts with image processing tools
like Gimp and Adobe Photoshop. For the 2D images we used the photos in Photoshop
(see Figure 2.13). Manual annotation of 3D surfaces requires special tools. As our
surfaces do not contain self-occlusions, we used orthophotos of the 3D surfaces for the
annotation. The process was the same as for the 2D images. For ambiguous areas of the
orthophotos we additionally provided a 3D viewer for the annotating experts. All 26
orthophotos have been precisely annotated. Examples of 3D ground truth annotations
are shown in Figure 2.14.
26
2.6 Ground Truth
(a) (b)
(c) (d)
Figure 2.14: Orthophotos (a and c) and their corresponding ground truth images (b and
d). White areas mark pecked regions in the orthophoto and black areas unpecked regions.
The red area surrounding the scan is the area where no information is available.
2.6.2 Shape Annotation
The purpose of the shape annotation is the quantitative evaluation of our shape recogni-
tion methods. We aim at automatic classification of petroglyph shapes into predefined
classes (see Figure 2.15). Typologies for petroglyphs are strongly related to the site.
Hence, each site or local group of sites has its own typology. For our ground truth, we
used two typologies. One typology was developed by a small group of 3D-Pitoti project
members (amongst them the author). The second typology is based on the typology
used in the most recent large-scale site publication about the Campanine site in Val-
camonica by Umberto Sansoni [SG09]. The Campanine site contains - in contrast to
the rest of Valcamonica - many medieval figures. Hence, it was necessary to extend the
typology in order to be appropriate for all rock art sites in our dataset. This extension
was done by Craig Alexander. We use the annotated shapes in three chapters. In Chap-
ter 7 we use a small set of 100 Pitoti to evaluate a shape descriptor based on the skeletal
graph. In Chapter 8 we refine the skeletonisation on a set of more than 1000 Pitoti. In
Chapter 9 we evaluate our combination of shape descriptors on a set containing more
than 1000 Pitoti. These 1000 Pitoti are classified into the two typologies we developed
for the project. In the following paragraph we describe the properties of the full dataset
used for shape classification.
The dataset consists of 1388 distinct petroglyphs (see Figure 2.15 for a part of
the material). For the experimental investigation of the classification capabilities in
27
2. MATERIAL
Figure 2.15: Petroglyph shapes from our database. The underlying tracings are c CCSP
and Alberto Marretta, used with permission.
28
2.6 Ground Truth
Subsection 9.2.1 we need to form classes that serve as ground truth for the evaluation
of our classifiers. We aim at using the whole dataset in our experimental evaluation.
Hence, the straightforward approach is to use each endpoint of a typology-tree branch as
a class. However, to provide a minimum amount of training data for a class, we only use
branch endpoints where at least 10 examples exist in our dataset. The two typologies
provide different views on the dataset. Thus, they have different branch endpoints with
different numbers of examples. Hence, the total numbers of petroglyphs investigated
for the two typologies are slightly different. See Tables 2.2 and 2.1 for the classes used.
2.6.3 Shape Annotation Tool
We developed an interactive tool that allows the precise annotation of digitised classic
petroglyph tracings. The tool supports the user in the manual classification of figures
and provides all necessary annotation fields specified by the two typologies introduced
in Subsection 2.6.2. The figures are annotated according to both typologies in parallel.
The exact shape of a figure and the textual annotation data is stored in a database on
a central server to facilitate further processing. The annotation tool is realised as a web
application that is accessible via the internet with any modern web browser. Thus it
can be used from any computer, independent of the operating system.
2.6.3.1 Main Features
The tool makes use of several features. We developed some of the features customarily
and adopted others from open source projects to meet our needs.
CanvasZoom: For good user experience with acceptable performance when navigat-
ing within the high resolution tracings, we use a tile-zooming algorithm with different
zoom levels, comparable to web-maps like Google Maps.
Different selection tools: To meet the different requirements for selecting the various
types of traced figures, we implemented a polygon selection tool, a rectangle selection
tool and a magic wand selection tool.
Special filiform selector: For selecting filiforms (scratched line-like figures, see the
arrow and the bow in Figure 2.14(d) for an example) we provide an alternative selection
mode that uses a different underlying colour model (LAB instead of RGB).
Classification: Drop-down menus allow the classification of a selected figure into two
different typologies. If the membership in a particular class is not certain, a confidence
value between zero and one can be added to the classification.
Multi-user functionality: To improve the level of confidence in the annotation of
figures, multiple users can individually annotate each figure. An administrative user
can see and compare all annotations.
Statistical reports: To allow a progress overview various statistical reports can be
generated from the database.
29
2. MATERIAL
Table 2.1: Classes based on branch endpoints of the modified Sansoni typology.
ID Type N
Prehistoric / Protohistoric - Natural world - Anthropomorph: armed
1 - Armed figure with shield and spear (no helmet) 61
2 - Armed figure with shield and sword (no helmet) 105
3 - Armed figure with sword (no shield or helmet) 10
4 - Armed figure with shield only 10
5 - Armed horseman 39
Prehistoric / Protohistoric - Natural world - Anthropomorph: unarmed
6 - Unarmed horseman 9
7 - Oranti praying figure 77
8 - Other anthropomorph not elsewhere classified 21
Prehistoric / Protohistoric - Natural world - Animal
9 - Bird 57
10 - Deer 60
11 - Dog 60
12 - Horse 56
13 - Other animal natural 77
Prehistoric / Protohistoric - Material culture - Structure
14 - Hut 143
Prehistoric / Protohistoric - Material culture - Other material culture
15 - Palette 35
Prehistoric / Protohistoric - Material culture - Weapon (not wielded)
16 - Axe 43
Prehistoric / Protohistoric - Symbolism - Writing
17 - North etruscan inscription 11
Prehistoric / Protohistoric - Symbolism - Topographic form
18 - Rectilinear filled form with line parallel to one long side 15
19 - Rectilinear filled form without line parallel to one long side 14
Prehistoric / Protohistoric - Symbolism - Specific symbol
20 - Rosa camuna 10
21 Prehistoric / Protohistoric - Other unclassified 9
Medieval and post-medieval - Material culture - Other material culture
22 - Key 39
Medieval and post-medieval - Symbolism - Cross
23 - Cross potent - cross with blocks at the terminals 30
24 - Greek cross equal width/height 19
25 - Latin cross taller than wide 36
Sansoni not securely datable - Geometrical form - Planar symbol
26 - Star 20
P
1066
30
2.6 Ground Truth
Table 2.2: Classes based on branch endpoints of the 3D-Pitoti project typology.
ID Type N
Inscription
1 - Local variation of the Etruscan alphabet 11
Abstract
2 - Camunian rose 10
3 - Map 56
4 - Star 20
5 - Circle 14
Object
6 - Cross 106
7 - Key 39
8 - Palette 35
9 - Shoe/footprint 98
10 - House 143
- Weapon
11 * Axe 43
Animal
12 - Dog 60
13 - Deer 60
14 - Horse 55
15 - Bird 57
16 - Other 85
17 Anthropomorph 335
Please refer to Figure 9.6 for the morphological
subdivisions of the Antropomorph class.
P
1227
2.6.3.2 Handling and Annotation Workflow
After logging in, the user first has to select a site and then a tracing of a rock panel
within which he wants to classify figures. Subsequently, the user is presented with an
overview of the whole tracing (Fig. 2.16). Already annotated figures on a tracing are
indicated with a grey bounding box. With a click on that bounding box, the exact
shape of that figure and its classification data are loaded from the database (Fig. 2.17).
The annotation process of a new figure is divided into several steps.
Zoom: The user can use the mouse wheel or the zoom-controls (buttons/slider) in
the user interface to zoom to maximum zoom level. Only at the maximum zoom level
a detailed shape selection can be made.
Selection of a Figure: To navigate within the image, the user can use the hand-tool to
click and drag the tracing. To select the shape of a figure, the user has a choice amongst
three tools. With the rectangle selection tool, the user first defines a rectangular area of
the tracing and subsequently indicates a colour by clicking on a tracing part within the
previously selected rectangular area. All pixels within the rectangle which have the same
colour as the clicked tracing part are selected and highlighted in yellow. The polygon
selection tool has the same functionality execpt that the area selected in the first step
31
2. MATERIAL
Figure 2.16: Overview of the ground truth annotation web tool interface.
Figure 2.17: View of a figure in its rock panel tracing.
32
2.6 Ground Truth
(a) Start selection. (b) Choose colour. (c) Selected figure.
Figure 2.18: Use of the polygon selection tool.
(a) Choose colour. (b) Selected figure.
Figure 2.19: Use of the magic wand selection tool.
33
2. MATERIAL
(a) Selection without filiform mode. (b) Selection with filiform mode.
Figure 2.20: Comparison of selection without and with filiform selection mode.
can be a polygon (Fig. 2.18). The magic wand selection tool allows the user to select
connected pixels with the chosen colour. Thus, large figures with low fragmentation
can be selected easily (Fig. 2.19). A special filiform selection mode enables the user to
select line-like figures. This mode utilizes another colour model for all selection tools
(LAB instead of RGB), which facilitates the correct selection of all pixels of the line-like
figure (Fig. 2.20).
Existing selections can be extended or reduced through addition or subtraction of
selections. If the user makes a mistake, the tool allows up to three steps of undo and
redo.
Classification of a Figure: As soon as the user has finished the selection of the shape
of a figure he can start the classification process. At the beginning he is presented with
one drop-down menu for the main classes of each of the two typologies. After selecting a
figure class in the drop-down menu, another drop-down menu with the classes from the
next level of the typology is revealed (Fig. 2.21). An item from each level of drop-down
boxes must be chosen in order to complete the classification process. If the user is unsure
about the classification of a figure, he can reduce the confidence value with a slider and
add a second classification set for the figure within the same typology. The sum of the
individual confidence values has to be 1 in order to complete the classification process.
Special cases (superimposition, etc.) that might occur when selecting a figure
(Fig. 2.22), are marked with flags which can be set in the classification process (Fig. 2.21).
2.6.3.3 Statistical Reporting
The user can access a list of all figure classes that contains the number of figures of each
class that he has already annotated (Fig. 2.23). Besides that, he can see how many
figures of each class are annotated in total, including the figures annotated by all other
users. The user can then concentrate on the annotation of certain figures, to populate
the ground truth dataset according to pre-established targets.
With a click on the number of figures in each class, the user gets an overview of
thumbnails of all images in that class (Fig. 2.24). With a further click on one of the
34
2.7 Conclusion
Figure 2.21: Classification process.
thumbnails, the user is presented with a detail view of that figure including its location
(site, rock and section), the full image and the classification data (Fig. 2.25). He can
edit the classification data or delete the figure from the database. A click on the image
or on the “Show figure in tracing” link opens the view of the figure in the context of
the rock panel tracing to which it belongs (Fig. 2.17). If necessary, the selection of the
shape of a figure can be modified using the selection tools.
2.7 Conclusion
In this chapter, we have introduced our material and its origin. We described rock art in
general, Valcamonica (the site where our material stems from) and several archaeological
aspects in Sections 2.1 to 2.3. We observe, that the methods we will develop in this
thesis may support archaeology in two aspects, namely documentation and analysis. The
automated segmentation of rock surfaces in pecked and natural regions as well as the
subsequent classification of figures have the potential to substitute classic documentation
methods. The high-resolution 3D scans allow analyses of pecking style that are not
possible with the naked eye.
To evaluate our methods, we need data for each task and respective ground truth
sets that are annotated by experts. In Sections 2.4 and 2.5 we described the 2D images
and the 3D scans we will use for segmentation and pecking style investigation. Section
35
2. MATERIAL
(a) Figures with superimposition. (b) Figure not completed by artist.
(c) Figure damaged. (d) Tracing incomplete.
Figure 2.22: Special cases when selecting a figure.
36
2.7 Conclusion
Figure 2.23: Statistics of numbers of figures in each class. Please note, that the screenshot
was taken at a later date than our experiments. As annotation is an ongoing process, the
numbers are higher.
2.6 deals with our ground truth data. In Subsection 2.6.1 we described the ground truth
annotation for 2D and 3D segmentation that resulted in binary masks encoding pecked
rock surface and natural rock surface. In Subsection 2.6.2 we presented the ground
truth annotation for the shape of petroglyph figures. Due to the limited amount of 3D
scans and consequently limited amount of petroglyph shapes we need a proxy dataset.
We use 2D images of classical petroglyph tracings. In Subsection 2.6.3 we presented a
web tool that allows the large-scale annotation of petroglyph shapes in classic tracings.
More than thousand petroglyph shapes have been annotated by experts. We observe,
that significant efforts were made to establish the datasets required for this thesis. A
large share has been achieved within the 3D-Pitoti project (see Chapter 1).
37
2. MATERIAL
Figure 2.24: All figures for class key.
38
2.7 Conclusion
Figure 2.25: Details for one key figure.
39
2. MATERIAL
40
Petroglyph-related Work
Figure 3.1: A petroglyph depicting a deer in Valcamonica, Italy.
This chapter contains work related directly to rock art and petroglyphs which is
relevant for our tasks. Please note that for better readability, the technical work
related to the different analysis steps is located in the respective chapters of this thesis.
First, we look into 3D scanning of rock art as an important part of our material stems
from 3D scans. Second, we present related work on the segmentation of Petroglyphs.
Third, we present approaches that investigate pecking styles. Finally, we present work
related to shape classification of petroglyphs.
41
3. PETROGLYPH-RELATED WORK
3.1 3D Scanning of Petroglyphs
3D scanning has been used in archaeological documentation since more than a decade
now. A well-known early example is Stanford’s approach to scan several Michelangelo
statues in Florence, Italy by Levoy et al. [LPC+00]. In the “Digital Michelangelo”
project, they built a system containing hardware (laser range finders, digital cameras)
and software to acquire high precision 3D outside the laboratory. The largest model
they acquired is Michelangelo’s David with (at the time of the scanning expected)
2 billion polygons and 7000 color images. It took years to reconstruct 3D models
with high accuracy. The project’s website states in news from 2009, which is 9 years
(!) after the acquisition of the scans: “We now have a full-resolution (1/4mm) 3D
model of Michelangelo’s 5-meter statue of David. The model contains about 1 billion
polygons.”.1 Scopigno et al. summarize the possible uses of 3D scans of cultural heritage
artifacts “beyond plain visualisation” [SCC+11]. They state, that 3D is used in cultural
heritage for two major tasks, a) studying artwork and b) serving as a support medium
for indexed archival knowledge. The examples include the detailed study of artworks,
documentation for on-site archaeology, restoration of fragmented artwork and virtual
restoration of painted decorations.
An early example in the rock-art domain is presented by Landon and Seales who pro-
pose a system for 3D acquisition and presentation of Puerto Rican petroglyphs [LS06].
They consider a full pipeline from acquisiton over scholarly use of the scans to pre-
sentation for the general public. They use a laser-based structured light system (SLS)
to acquire the rock surfaces. They describe a software which allows interactive surface
manipulation (e.g. false lighting), surface mark-up and large-scale display. Díaz-Andreu
et al. use laser scanning and photogrammetry to scan a rock surface in Cumbria on
which two decades earlier a spiral had been documented [DABRR06]. Although the
spiral could not be found again, the authors state that the quality of the 3D docu-
mentation is far beyond traditional documentation regarding precision and accuracy.
Lerma et al. describe the use of terrestrial laser scanning (TLS) for the documentation
of the complete cave of Parpallo (Spain) with peckings dating back to the Upper Pale-
olithic era [LNCV10]. Additionally to the TLS they use close range photogrammetry
for the pecked parts of the surface. Gonzalez-Aguilera follow a comparable approach in
Pindal cave in Spain [GAMNRGM11]. Sanz et al. use a standard digital camera and
photogrammetry to record petroglyphs at several sites in Galicia (Spain) [SDR+10].
A recent approach that considers multiple scales to acquire 3D data of petro-
glyphs together with the surrounding rock panel has been proposed by Alexander et
al. [APR15] in the 3D-Pitoti project. The authors describe the necessities for differ-
ent scales of acquisition for a holistic approach to acquire 3D data of archaeological
sites. They capture the sites together with the surrounding landscape in three scales -
macro (meters), mid-range (centimeters) and micro (sub-millimeter, see Figure 3.4) and
describe methods for the mid-range and the micro scale. They use un-manned aerial
vehicles (UAVs) for the mid-range and a specially built “rock art scanner” based on
1
http://graphics.stanford.edu/projects/mich/
42
3.1 3D Scanning of Petroglyphs
Figure 3.2: The 3D-Pitoti project scanner, c Axel Pinz, used with permission.
43
3. PETROGLYPH-RELATED WORK
Figure 3.3: An early example of high-resolution 3D acquisition of a rock surface. Laser
scanning of Michelangelo’s David.1
photogrammetry for the micro-scale (see Figure 3.2). In this thesis we use the ground
truth data of the micro-scale method (see Appendix A).
3.2 Segmentation of Petroglyph Images
The next step after acquisition is segmentation of the rock surface in pecked and natural
surface areas using digital images or 3D scans of the surface. The automatic segmen-
tation of rock surfaces containing petroglyphs has rarely been addressed so far. Zhu
et al. consider the automated segmentation using digital images infeasible, hence they
propose a collaborative manual segmentation approach that utilizes CAPTCHAs for
rock art image segmentation [ZWKL10]. Deufemia and Paolino apply segmentation to
digitized manual tracings [DP14]. This is a rather straight-forward task since manual
tracings are binary images that already represent a manual pre-segmentation of the
figures.
1 c Digital Michelangelo Project, Stanford University, used with permission.
44
3.3 Pecking Style Similarity
Figure 3.4: Scales of acquisition for the holistic approach of capturing a complete valley
in the 3D-Pitoti project.
3.3 Pecking Style Similarity
Zotkina et al. conduct an experimental study of percussion and tool types in the Mi-
nusinsk basin in Siberia [ZTKP14]. They replicate different peck marks with a diverse
set of tools. Moitinho et al. approach a numeric analysis of pecking styles using termi-
nology and software from geomorphology [dARBP13]. They aim to estimate the tools
and gestures used for creating rock art. The approach includes experimental archaeol-
ogy to replicate ancient peck marks. They scan the surfaces with 0.28mm resolution.
From the scans, they use patches with 50x50mm2 containing homogeneous peck marks
to extract surface metrology parameters with proprietary software. They cluster the nu-
meric descriptions of the peck marks and observe three clusters which contain surfaces
with the following properties: 1) fine, 2) coarse and regular, 3) coarse and irregular.
They do not observe clusters linked to the manufacturing process of the samples.
3.4 Shape Classification of Petroglyphs
Only a small number of papers has been published in the field of automated petroglyph
classification. Other areas of archaeology in which automatic image classification has
been investigated range from finding archaeological sites in aerial photographs [TLS09,
Kva13] to identify individual characters in monumental Maya inscriptions [RRPOGP11].
45
3. PETROGLYPH-RELATED WORK
The latter application is perhaps closest to what our approach and the related work
discussed below aim to achieve: Maya glyphs exhibit considerable variation in the way
a given character is carved, much like the variation one finds in, say, the petroglyph
depiction of a deer in Valcamonica.
Numerous surveys about shape analysis have been published, comprehensive and
important in this field are the surveys by Pavlidis [Pav78], Loncaric [Lon98], Zhang
and Lu [ZL04] and Yang et al. [YKR08]. Classifications and taxonomies of shape de-
scriptors have been proposed in the mentioned surveys in different variants. A widely
used common denominator is the distinction between contour-based and region-based
descriptors. We observe that the petroglyph shapes in our dataset are perceptually close
to skeletons. Hence, we add a third category of skeleton-based approaches and summa-
rize region-based, contour-based and skeleton-based petroglyph-related approaches.
Region-based descriptors: Zhu et al. propose the use of a slight modification of
the Generalised Hough Transform (GHT) for the mining of large petroglyph datasets
[ZWKL11]. The main arguments for GHT and against other shape similarity measures
are the existence of petrogplyph images where a single petroglyph consists of several
parts and the possibility of merged parts of petroglyphs that drastically change the
topology of the petroglyphs. They extensively evaluate their approach and achieve
good results. However, they mostly evaluate their approach on proxy datasets (Farsi
digits, sketches) and synthetic petroglyph shapes1. Deufemia et al. use the radon trans-
form of petroglyph images as the shape descriptor for unsupervised recognition via self-
organizing maps (SOM) [DPdL12]. In a second step, they use a fuzzy visual language
parser to solve ambiguous interpretations by incorporating archaeological knowledge.
They evaluate the approach on a large dataset and achieve good results. Deufemia
et al. use an Enhanced version of Generic Fourier Descriptors (EGFD) to detect and
classify petroglyphs from full scenes [DP14]. In their experiments on 1215 petroglyphs
in 53 scenes EGFD outperforms the descriptors GHT and Shape Context. They do not
combine the descriptors.
Contour-based descriptors: Deufemia et al. propose a two stage classification of pet-
roglyphs [DP13]. They use shape context descriptors [BMP02] to provide an initial raw
clustering with self-organizing maps. In the second step, they use an image deformation
model to classify the petroglyph shapes. They evaluate their approach on a relatively
small dataset (17 classes with 3 examples each) that they enlarge by using 30 affine
transformations of each image.
Skeletal descriptors: To our knowledge, we are the first to utilize skeletal descriptors
for petroglyph classification.
3.5 Discussion
We discuss related work for segmentation, pecking style similarity and shape classifica-
tion as we follow these topics throughout the thesis.
1See http://alumni.cs.ucr.edu/~qzhu/petro.html for the evaluation data.
46
3.5 Discussion
Segmentation: Prior to our work, automatic segmentation has been considered in-
feasible. The existing approaches are either manual [ZWKL10] or work on material
that is already a binary image [DP14]. To our knowledge, we are the first to approach
automatic segmentation based on digital photographs as well as on 3D point clouds.
Pecking Style Similarity: In contrast to Moitinho et al. we approach the peck
mark similarity estimation with 3D surface descriptors which have been developed for
registration and object detection. To our knowledge, there is no work on automated
peck mark similarity estimation comparable to ours.
Shape Classification: All existing approaches dealing with petroglyphs have draw-
backs. Zhu et al. evaluate at large scale on publicly available data, but the data are
different from the material we are using [ZWKL11]. They use inter alia sketch data
and synthetic petroglyph shapes. In contrast to that, our data is the product of the
standard archaeological documentation process which is manual tracing of individual
peck-marks on transparent sheets (see Subsection 2.6.1). Deufemia et al. evaluate on
small datasets [DPdL12][DP13]. In an other approach they have a large dataset and
compare three different descriptors in their evaluation, but they don’t evaluate the com-
bination of descriptors [DP14]. Moreover, they only compare region-based descriptors.
In our approach, we aim to overcome these shortcomings.
47
3. PETROGLYPH-RELATED WORK
48
Segmentation of Petroglyph Images
Figure 4.1: A human figure and its automatic segmentation result.
Different time-consuming manual methods to determine and document the exact
shapes and spatial locations of petroglyphs on one rock panel have been carried
out over the past decades [AC94]. In this chapter, we address our goal to support
the documentation work with (semi-)automatic tools by an approach that uses high-
resolution photographs of rock surfaces. In Chapter 5 we will investigate segmentation
based on 3D point clouds. In both chapters, we approach segmentation of petroglyphs
as pixel- or point-wise classification.
Segmentation of petroglyphs based on photographs of the rock surface (see an exam-
ple in Figure 4.1) is a challenging task. Figure 4.2 shows a detail of our rock art material
(for more details on our material please refer to Section 2.4). The depiction is highly
dependent on the direction of the light. Glacial polish causes noise in the non-pecked
regions. The exact boundaries of the pecked regions are hard to determine, as the pecks
are of varying depth and quality. Abrasion and alteration change the appearance of the
rock and the artwork.
We approach segmentation as pixel classification. We propose novel feature variants
49
4. SEGMENTATION OF PETROGLYPH IMAGES
Figure 4.2: Detail of rock art material.
50
4.1 Related Work
and evaluate visual features, feature variants and feature combinations. We show that
with the appropriate combination of different visual features, good segmentation results
are possible. Furthermore, we demonstrate that varying a simple threshold for classifier
fusion, the user can interactively change the position of the result on the recall-precision
curve.
In Section 4.1 we present related work. Section 4.2 describes our approach. In
Section 4.3 we describe the evaluation of our approach and in Section 4.4 we present
the results.
4.1 Related Work
We summarize related work in fields with pixel-wise classification approaches compara-
ble to our task and in texture classification. Yin et al. [YLH+09] use color and edge
features in a k-NN classifier for rock structure classification in Formation MicroImager
(FMI) images1. Partio et al. [PCGV02] use gray level co-occurence matrices (GLCM,
see [HSD73]) and Gabor filters to model textures of rock images. They perform classifi-
cation with a k-NN classifier. The results of the evaluation on a test database containing
168 rock images in 7 classes are good for rock images that have almost homogeneous
textures. GLCM performs better than Gabor filters. The authors suggest that prior
texture segmentation should be performed for rock images with multiple textures.
Khoo et al. [KOW08] model textures as GLCM and use a support vector machine
(SVM) to classify textures. They evaluate their segmentation approach on few synthetic
texture mosaics and two satellite images with good results. Kim et al. [KJPK02] use
a support vector machine (SVM) for texture classification. They use the pixel intensi-
ties as input for the SVM, i.e. no prior feature extraction is performed. The evalua-
tion of their approach against synthetic texture mosaics delivers good results. Varma
and Zisserman [VZ05][VZ03] use textons (see [LM99]) as texture models. They evalu-
ate their approach on the Columbia-Utrecht reflectance and texture database (CURet
[DvGNK99]) and achieve very good classification results with and without the usage of
filter banks.
4.2 Approach
4.2.1 Pixel Classification
We define any pixel that is inside a petroglyph as foreground pixel and subsequently
any other pixel as background pixel. First, for each pixel to classify we obtain a local
window of the input image with this pixel in the center. Second, we extract visual
features of each of these local windows. Third, we train an SVM. Fourth, we optionally
1The Formation MicroImager is a proprietary micro-resistivity imaging device used for
borehole imaging, see http://www.slb.com/~/media/Files/evaluation/brochures/wireline_open_
hole/geology/fmi-hd_br.pdf.
51
4. SEGMENTATION OF PETROGLYPH IMAGES
Classifier
Classifier Fusion
Early Fusion Classifier
ResultsSingle Features
Figure 4.3: Schema of different fusion strategies applied.
fuse several classifiers (see Section 4.2.2). Fifth, we classify new data with the models
obtained in step three or step four.
Features for object classification include color, edge and texture based features.
Datta et al. [DJLW08] state, that the major types of features in image retrieval are
color, texture, shape and salient points. Shape features are not suitable for our task,
as shape is an attribute of an image segment, i.e. shape features are extracted post
segmentation. Since our material contains too many salient points (i.e. interest points
or corner points) due to the structured surface (see Subsection 4.3.1) we rule out salient
points as well. The three feature categories we consider for our task are therefore color,
edge and texture.
4.2.2 Fusion Strategies
The different features incorporated in our pixel classification approach may contain
complementary information. Therefore, we apply different combinations of these fea-
tures with different fusion strategies. The strategies we employ are Early Fusion and
Late Fusion or Classifier Fusion (see Figure 4.3).
Early Fusion: We combine the feature vectors of different features by concatenation.
In the next step, we use the combined feature vectors to train the classifier.
Classifier Fusion: We utilize the output labels from single feature classifiers and
early fusion classifiers. We fuse the output labels. Numerous methods to fuse classifier
output labels have been proposed and evaluated. A frequently used strategy is majority
voting. Consensus can be achieved by unanimity (all agree), simple majority (50%+1
agree) or by plurality (most votes) [Kun04]. In our two-class case, majority is equivalent
to plurality. To fuse our classifier outputs, we propose an extended version of majority
voting. We utilize the number of votes for a foreground pixel in two ways. First, we
assume that the number of votes is equivalent to the confidence in the fusion result for
this pixel. In the output image of the classification results, we color foreground pixels
with high confidence, medium confidence and low confidence differently. Second, we
assume that the best segmentation output of our classifier in terms of used measures is
not necessarily the output that suits our needs best.
The sizes of our two classes (foreground and background pixel) differ by one order
of magnitude. Hence, we measure the quality of our results with recall, precision and
52
4.3 Experiments
F1, where tp is the number of true positives, fp is the number of false positives and fn
is the number of false negatives.
precision =
tp
tp+ fp
(4.1)
recall =
tp
tp+ fn
(4.2)
F1 = 2.
precision.recall
precision+ recall
(4.3)
A use case for segmentation results is the highlighting of pecked regions on an image
of a rock panel. An optimization for precision could lead to better use of results.
Thus, we use a modified majority voting. The user changes a threshold between a
value of 1 and the number of classifiers to fuse. The value of this threshold corresponds
with the number of votes needed to classify a pixel as foreground pixel. If the threshold
is 1, a pixel is classified as foreground pixel if any of the fused classifiers classified this
pixel as foreground pixel. Subsequently, the number of votes needed for the majority
increases with the threshold. If the threshold is at its maximum, a pixel is classified as
foreground pixel, if all fused classifiers labelled this pixel as foreground pixel.
4.3 Experiments
We extract the visual features out of local windows with a size of 128x128px. We
chose the window size based on preliminary experiments with window side lengths of
32, 64, 128 and 256px. 128px was the best trade-off between distinctive performance,
computational demand and blurring. We extract color, edge and texture features (see
Subsection 4.3.2) out of the windows with a horizontal and vertical stepsize of 16px. We
use a support vector machine (SVM) for classification. We train the classifier with single
features and employ several fusion strategies (see Subsection 4.2.2). For validation, we
utilize repeated random sub-sampling with 10 sub-samples [Rao05].
4.3.1 Material
We use two carefully selected 21 MPixel images of rock surfaces containing petroglyphs.
They are differently lit and the typical rock structure appears visually different in the
two images. Furthermore, they contain petroglyphs with different pecking styles. The
test image composed of these two images has a size of more than 40 million pixels.
We use the images in full resolution. The material poses several challenges. Grass is
growing in cracks in the rocks. The surfaces of the rocks have scratches and other traces
of abrasion and alteration (see Figure 4.4). The petroglyphs are pecked with different
styles, i.e. with different tools by different authors. For details about the employed
material please refer to Section 2.4.
53
4. SEGMENTATION OF PETROGLYPH IMAGES
Figure 4.4: Small details of our test images (see Figures 2.9(a) and 2.10). Problems of
the material include grass (tl), shadows (tr), scratches (bl) and cracks (br).
54
4.3 Experiments
Table 4.1: Features extracted from each local window.
Feature Abbrevation Feature Description
RGBHistM RGB histogram with M bins
LumHistN Luminance histogram with N bins
DSIFT_gauss Dense SIFT with prior gaussian blurring
DSIFT_nogauss Dense SIFT without prior gaussian blurring
GLCMO Gray level co-occurence matrix of order 2 calculated from O gray
levels
GLCMStatO 21 statistical measures calculated from GLCMO
GLCMPOrderQ Gray level co-occurence matrix of order Q calculated from P
gray levels
GLCMStatPOrderQ 21 statistical measures calculated from GLCMPOrderQ
LBP_Hist_u2_nU_rV Local binary pattern histogram with U sampling points on a
radius of V
4.3.2 Visual Features
Table 4.1 shows an overview of the extracted features. We describe some of the extracted
features in more detail.
Dense SIFT: Lowe [Low04] proposed the widely used SIFT features. We do not
need the prior salient detector, and we do not need rotation or scale invariance, thus
we extract dense SIFT features with and without prior Gaussian blurring [VF08].
Gray Level Co-Occurence Matrix: Haralick et al. proposed the classic feature to
describe texture in 1973 and evaluated it for crop classification [HSD73]. The image
is scaled to N gray-levels. The gray levels of neighboring pixel-pairs are compared
and each occurence of a distinct combination is counted. This results in a matrix of
size N xN containing these counts. Haralick proposed several statistic measures for
this matrix that are used as feature vector. GLCM and GLCMStat for order 2, i.e.
the comparison of two pixels, are widely used visual standard features. But there are
hardly any evaluations of higher order GLCMs (e.g. by Anys and He 1995 [AH95]).
The memory requirements of higher order GLCMs grows exponentially. A GLCM of
order 3 (comparison of three pixels) results in a matrix of size N xN xN. We had a look
at GLCMs of order 3 in our evaluation. N=8 results in 512 feature dimensions, N=16
results in 4096 feature dimensions. The Matrices are sparse, to reduce dimensionality we
discarded the sparsest feature dimensions in the training data set and subsequently the
same dimensions in the complete data set. Inspired by the measures Haralick proposed,
we calculate 21 statistical measures from each feature vector.
Local Binary Patterns: Ojala et al. proposed local binary patterns (LBP) as mul-
tiresolution gray-scale and rotation invariant texture features in 2002 [OPM02]. LBPs
are widely used in several application fields, e.g. crop classification and face recognition.
55
4. SEGMENTATION OF PETROGLYPH IMAGES
0.3 
0.35 
0.4 
0.45 
0.5 
0.55 
0.6 
0.65 
0.7 
R
G
B
H
is
t3
2 
Lu
m
H
is
t9
6 
Ed
ge
H
is
t 
D
SI
FT
_g
au
ss
 
D
SI
FT
_n
og
au
ss
 
G
LC
M
8 
G
LC
M
16
 
G
LC
M
32
 
G
LC
M
St
at
8 
G
LC
M
St
at
16
 
G
LC
M
St
at
32
 
G
LC
M
8O
rd
er
3M
ea
n1
 
G
LC
M
16
O
rd
er
3M
ea
n2
 
G
LC
M
16
O
rd
er
3M
ea
n1
0 
G
LC
M
St
at
8O
rd
er
3 
G
LC
M
St
at
16
O
rd
er
3 
G
LC
M
St
at
32
O
rd
er
3 
LB
P_
H
is
t_
u2
_n
8_
r2
 
LB
P_
H
is
t_
u2
_n
16
_r
2 
LB
P_
H
is
t_
u2
_n
8_
r1
 
LB
P_
H
is
t_
u2
_n
16
_r
1 
LB
P_
H
is
t_
u2
_n
8_
r3
 
LB
P_
H
is
t_
u2
_n
16
_r
3 
Figure 4.5: F1- results of single feature classifiers.
4.3.3 Training and Classification
Single Features: We train a classifier with each single feature. We evaluate the
performance of the models to learn about the isolated distinctiveness of each feature.
Early Fusion: Based on the results of single feature classifiers, we combine different
features by concatenation. We assume that different single feature categories contain
complementary information. We use the concatenated feature vectors to train the clas-
sifier. We evaluate the performance of the early fusion models.
Late Fusion: We combine the output labels of different classifiers with the strategies
explained in Subsection 4.2.2. We fuse the classifiers with the best results employing
majority voting. We utilize single feature classifiers as well as early fusion classifiers.
We assume that even early fusion classifiers with nearly similar features and results
contain complementary information. We want to clarify, whether late fusion improves
the results compared to early fusion.
Late Fusion with Thresholded Majority Voting: We use the best result from the
previous paragraph and vary the voting threshold. We visualize the results to answer
the following question: Does a combination of the different late fusions improve the
segmentation?
Interactive Late Fusion: The user can change the threshold for the majority voting
interactively (see Subsection 4.2.2). We use the same classifier combinations as in the
previous paragraph.
56
4.3 Experiments
0.55 
0.6 
0.65 
0.7 
0.75 
0.8 
A B C D E F G H I J K L M N O P 
Figure 4.6: F1- results of feature combination with early fusion. The red line marks the
result of the best single feature. The corresponding feature combinations are denoted in
Table 4.2.
Table 4.2: Feature Combinations used for early and late fusion.
ID Features
A LBP_Hist_u2_n16_r3, GLCMStat32, RGBHist32, EdgeHist, DSIFT_gauss
B LBP_Hist_u2_n16_r3, GLCMStat16Order3, RGBHist32, EdgeHist, DSIFT_gauss
C LBP_Hist_u2_n16_r3, GLCM16Order3Mean10, RGBHist32, EdgeHist,
DSIFT_gauss
D GLCM16Order3Mean10, RGBHist32, EdgeHist, DSIFT_gauss
E GLCMStat32, RGBHist32, EdgeHist, DSIFT_gauss
F GLCMStat16, RGBHist32, EdgeHist, DSIFT_gauss
G GLCM16Order3Mean10, RGBHist32, DSIFT_gauss
H GLCM16Order3Mean10, RGBHist32, EdgeHist
I GLCMStat32, RGBHist32, DSIFT_gauss
J GLCMStat32, RGBHist32, EdgeHist
K GLCM16Order3Mean10, RGBHist32
L GLCM16Order3Mean10, DSIFT_gauss
M GLCM16Order3Mean10, EdgeHist
N GLCMStat32, RGBHist32
O GLCMStat32, DSIFT_gauss
P GLCMStat32, EdgeHist
57
4. SEGMENTATION OF PETROGLYPH IMAGES
0.6 
0.65 
0.7 
0.75 
0.8 
R S T U V W X Y Z 
Figure 4.7: F1- results of feature combination with late fusion. The red line marks the
result of the best single feature. The corresponding feature combinations are denoted in
Table 4.3.
4.4 Results
4.4.1 Single Features
Figure 4.5 shows selected F1-results of the single feature classifiers. We observe that
the LBP variants and third-order GLCM variants perform best. The RGB histogram
with 32 bins is also performing well. The luminance histogram and the DSIFT variants
perform poorly. The edge histogram is performing moderately. The second order GLCM
variants perform comparable to the edge histogram. The statistical measures perform
better than the raw GLCMs. We observe that a higher number of grey values leads to
better results. We make the opposite observation in the case of the third order GLCM
variants. The raw GLCMs, that have been reduced in dimensionality by discarding the
sparsest dimensions, perform better than the statistical measures. A higher number of
grey values results in worse performance. The statistical measures of the third-order
GLCMs can only be calculated from the complete (i.e. prior dimensionality reduction)
GLCMs. Hence, we assume, that a high number of grey values results in over-fitting,
and therefore in lower quality results. The LBP variants deliver the best results. We
observe that a bigger radius and a lower number of sample points improve results. The
best result was obtained with the LBP histogram calculated from 8 sample points with
a radius of 3. We conclude, that texture features generally outperform the other feature
categories.
4.4.2 Early Fusion
Figure 4.6 contains selected results of early fusion classifiers. The corresponding feature
combinations are denoted in Table 4.2. The number of features we combined decreases
from A to P.
Bars A,B and C represent the results of combinations of five features. They contain
well-performing features of each feature category. The only variation in the feature
58
4.4 Results
Table 4.3: Fused classifiers. Please refer to Table 4.2 for features used in feature combi-
nations.
ID Feature Combinations
R Five single feature classifiers trained with the features of combination A
S The classifiers from R and the early fusion classifier A
T Five single feature classifiers trained with the features of combination B
U The classifiers from T and the early fusion classifier B
V Six single feature classifiers trained with the features of combinations A and B and the
early fusion classifiers from A and B
W Seven single feature classifiers trained with the features of combinations A, B and C
and the early feature classifiers from A, B and D
X The classifiers from W and the early fusion classifiers E and F
Y The classifiers fromX and the early fusion classifiers G, H, I and J
Z All single feature and early fusion classifiers
combinations of A, B and C is the used GLCM variant. A and B contain second and
third order statistic measures, C contains a third order GLCM. We observe that A and
B perform nearly similar, while C performs significantly worse. This is noteworthy, as
the single feature performance is contrary. The GLCMStat features have 21 dimen-
sions. The GLCM16Order3 has 4096 dimensions, the reduced version we use has 255
dimensions. We assume, that the performance drop that occurs with the utilization of
GLCM16Order3Mean10 is a consequence of over-fitting.
Bars D-F show the results of combinations of four features. The combinations
contain features of all feature categories except the LBPs. Bar D contains the feature
combination of bar C except the LBP. We observe that D is better than C, despite the
fact that it contains less information. This is more evidence for our assumption, that C
performs poorly due to over-fitting. Generally, the four feature combinations perform
comparably well.
Bars G-J display the results of combinations of three features. The combinations
contain a GLCM variant, an RGB histogram and either an edge histogram or a dense
SIFT descriptor. We observe that the combinations with GLCM16Order3Mean10 (G
and H) are generally better than the combinations that utilize GLCMStat16 (I and J).
Bars K-P show the results of combinations of two features. Bars K-M contain
GLCM16Order3Mean10 and a feature of another feature category, bars N-P contain
GLCMStat32 and a feature of another feature category. We observe that the combina-
tions with GLCM16Order3Mean10 outperform the combinations with GLCMStat32.
We conclude, that early fusion by simple concatenation of feature vectors signifi-
cantly improves the results in comparison to single feature classifiers (see Figure 4.6).
The more features we combine, the better the results are. The best results occur at the
combination of features of each feature category. Over-fitting as a result of high feature
dimensionality limits the number of features to be combined (see classifier C in Figure
4.6).
59
4. SEGMENTATION OF PETROGLYPH IMAGES
4.4.3 Late Fusion
Figure 4.7 contains selected results of late fusion classifiers. Bar R represents the late
fusion result of the classifiers trained with the features of early fusion run A. We observe
that in this case early fusion performs better than late fusion. Bar S shows the results
of the classifiers from R and the early fusion classifier A. We observe that the result
of S is slightly better than the result of A (see Figure 4.6). This is noteworthy, as it
contains the same features as A, i.e. the information primarily extracted from the image
is similar. Bars T and U represent the same idea as R and S, but with the features of
early fusion run B instead of A. We observe that also in this case the combination of
the single feature classifiers with the early fusion classifier of the same features performs
better than the combination of the single feature classifiers only. In contrast to S, U
performs slightly worse than the corresponding early fusion run B.
Bar V represents the late fusion result of six single feature classifiers trained with
all features in A and B combined with the early fusion classifiers A and B. We observe
that the combination of more features improves results.
Bars W, X, Y and Z extend the number of classifiers used for late fusion further.
Bar W includes a classifier for each single feature in early fusion runs A-D as well as
the three best performing early fusion classifiers A, B and D.1 We observe, that late
fusion result W significantly outperforms the best early fusion run. Bar X includes all
the classifiers from W and adds early fusion classifiers E and F. We observe that X
performs slightly better than W. We assume that the early fusion classifiers D, E and
F contain complementary information, although they perform almost similar. Bar Y
includes all classifiers from X and adds the early fusion classifiers G, H, I and J. We
observe that Y performs slightly worse than X. Bar Z represents the result of late fusion
including all single feature classifiers from X as well as all early fusion classifiers A-P
(except C). We observe that the result is significantly better than any early fusion result,
but slightly poorer than X. We assume that the poor performing early fusion classifiers
I-P do not contain any information to improve the result. Nevertheless, we observe that
late fusion efficiently profits from the information in the different classifiers. The results
of the best classifier combination X is visualized in Figure 4.11.
We conclude, that late fusion significantly improves the results compared to early
fusion. We assume, that the combination of labels in contrast to early fusion, where
feature vectors are combined, allows the training of more precise classifiers, and con-
sequently preserves the complementary information of the different features more effi-
ciently. Above all, late fusion can be utilized to combine as many classifiers as needed,
because - in contrast to early fusion - there is no danger of over-fitting.
4.4.4 Late Fusion with Thresholded Majority Voting
Figure 4.8 shows the visualization of late fusion with thresholded majority voting. We
observe, that we achieve smooth segmentation results by considering also pixels with
lower confidence.
1We skip early fusion classifier C due to the assumption that it is over-fitted.
60
4.4 Results
Figure 4.8: Two details of our test material with a visualization of late fusion run Y with
thresholded majority voting. The grey value of a pixel corresponds to the classification
confidence of a pixel. Darker means more confident. We observe that some blurring
happens, e.g. at the boundaries of the shield in the upper right. We assume that the
blurring is due to the size of the local window we use for feature extraction.
61
4. SEGMENTATION OF PETROGLYPH IMAGES
p
r
0.3 
R
G
B
H
is
t3
2 
Lu
m
H
is
t9
6 
Ed
ge
H
is
t 
D
SI
FT
_g
au
ss
 
D
SI
FT
_n
og
au
ss
 
G
LC
M
8 
G
LC
M
16
 
G
LC
M
32
 
G
LC
M
St
at
8 
G
LC
M
St
at
16
 
G
LC
M
St
at
32
 
G
LC
M
8O
rd
er
3M
ea
n1
 
0.4 
0.45 
0.5 
0.55 
0.6 
0.65 
0.7 
0.75 
0.8 
0.85 
0.9 
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 
Pr
ec
is
io
n 
Recall 
R S W X Z 
Figure 4.9: Precision and recall for late fusion with thresholded majority voting. The
threshold ranges from two to the number of fused classifiers minus one. The corresponding
feature combinations are denoted in Table 4.3.
62
4.4 Results
t=12 t=10 t=7 t=4 t=1
Figure 4.10: Visualization of two details of our test material classified with different
thresholds used in late fusion run Y. Red dots are classified as background. Each column
shows the result of a specific threshold t. The most left column corresponds to t=12, i.e.
all fused classifiers must agree. The most right column corresponds to t=1, i.e. at least
one classifier must classify a pixel as foreground pixel. We observe that the ideal threshold
for segmentation varies.
Figure 4.9 shows the precision-recall curves of the five late fusion runs R,S,W,X
and Z with different thresholds. We observe, that a change of the majority voting
threshold allows us to vary our results along the precision-recall curve. This can be
useful for different applications, where a balanced precision and recall are not always
preferable. Moreover we observe, that with certain classifier combinations, we can move
the precision-recall curve to the upper right, i.e. our good late fusion results are not
caused by an optimization of the F1- value.
4.4.5 Interactive Late Fusion
Figure 4.10 shows the visualization of two details of our test material overlayed with
the classification results of different late fusion thresholds.
The adjustment of the thresholds allows the user to interactively increase the sub-
jective quality of the result. Technically, the varying threshold corresponds to a varying
position of the result along the precision-recall curve (see Figure 4.9). Furthermore, we
observe that different parts of our test material demand different thresholds for ideal
segmentation. The ideal threshold for the detail in the top row is 10, while the ideal
threshold for the detail in the bottom row is 4. We assume, that this is caused by
different contrast in our test material.
We conclude, that the possibility to vary the threshold interactively improves the
subjective quality of the results.
63
4. SEGMENTATION OF PETROGLYPH IMAGES
4.5 Conclusion
We presented an experimental study on petroglyph segmentation by pixel classification.
We showed, that with appropriate features, feature combination and fusion strategies
good segmentation results are achievable. The most important lessons learned are:
• Texture features generally outperform features of other categories. The best per-
forming single features are third order grey level co-occurrence matrices and local
binary patterns.
• Early fusion significantly improves results compared to single features. The more
features we fuse the better. High dimensionality of feature vectors and subsequent
over-fitting limit the number of single features to be fused.
• Late fusion by fusion of classifier output labels clearly outperforms early fusion.
Complementary feature information is preserved better, and there is less danger
of over-fitting.
Varying the number of votes needed for the majority voting used in late fusion allows
to determine a confidence for each classified pixel as well as to change the position of
the classification result on the recall-precision curve. We assume, that this variable
threshold improves the use of the results.
The results are good, even better when the classifier threshold is chosen manually.
What we cannot solve is the dependence on the lighting during photographic acquisition
of images. The images we used for our experiments are well-lit because we awaited the
appropriate time of the day for capturing. We suppose that pixel classification based on
2D RGB images performs worse with different illumination. To be more independent of
lighting we approach the segmentation task on 3D point clouds in the following chapter.
64
4.5 Conclusion
Figure 4.11: Our test material overlayed with the best classification result (late fusion
run X).
65
4. SEGMENTATION OF PETROGLYPH IMAGES
66
Segmentation of Petroglyph 3D Point Clouds
Figure 5.1: A petroglyph scene with overlayed segmentation ground truth.
In this chapter we address the segmentation of petroglyphs based on 3D scans. We
assume that the distinctive description of the micro-topography of a surface is easier
without the influence of lighting which poses a major challenge for the image-based
segmentation described in the previous chapter.
The appearance of a 3D surface can be seen as the composition of visual appear-
ance (visual texture, image texture) and tactile appearance (surface texture) [TJ98,
BMdA12]. Visual appearance refers to color variations, brightness and reflectivity. Sur-
face texture refers to the geometric micro-structure of a surface in terms of roughness,
waviness, lay, coarseness, smoothness, polish, burnish and bumpiness [BJ03, BMdA12].
It is defined as the repetitive random deviation from the “ideal" surface. This deviation
forms the three-dimensional topography of a surface [ASM96]. Compared to the visual
appearance of a surface, the surface topography is invariant to lighting and viewing con-
ditions and thus a robust basis for the description of a 3D surface. Research has mainly
focused on the description and classification of surfaces in terms of their visually appar-
ent texture, fostered by the broad availability of image texture datasets [VZ05, ZMLS07].
The analysis of surface texture directly from 3D data has been neglected so far due to
the absence of suitable high-resolution 3D data.
Our petroglyph scans are highly detailed with a resolution of 0.1mm (see Figure 5.1
67
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
for an orthophoto and Appendix A for details on the 3D dataset). Hence, they pose an
opportunity to approach the analysis of surface texture based on 3D data. We follow
two different strategies: In Section 5.1 we use the high resolution point clouds directly
and employ 3D descriptors. In Section 5.2 we transfer the problem to image space by
using a depth map of the surface. The depth map contains all necessary information
due to the fact that the rock surfaces containing petroglyphs usually do not contain
self-occlusions. In image space we use well-established 2D image features to distinguish
between natural rock surface and pecked rock surface.
5.1 3D Descriptor-based Point Cloud Segmentation
5.1.1 Related Work
3D reconstructions are usually represented as point clouds, meshes and range images.
For the representation of surface topography, descriptors are required that capture the
local geometry around a given point. A large number of local 3D descriptors has been
developed in recent years. Johnson and Hebert propose spin images as local descrip-
tors for the dense description of meshes for object recognition in 3D scenes [JH99].
Darom and Keller introduce a scale detection scheme for mesh points and propose
scale-invariant spin images [DK12]. Furthermore they extract SIFT features from lo-
cal depth images of a points’ neighborhood to model the surface topography around
it. Zaharescu et al. introduce a local surface descriptor for meshes (MeshHOG) that
can be computed from an arbitrary scalar function (e.g. curvature) defined over the
surface [ZBVH09]. Steder et al. propose the normal-aligned radial feature (NARF)
which captures local depth variations in range data [SRKB11]. Frome et al. extend the
well-known 2D shape context descriptor [BMP02] to 3D point clouds (3DSC) and show
that it outperforms spin images [FHK+04]. Rusu et al. propose two local 3D point
cloud descriptors, namely persistent point feature histogram (PFH) [RMBB08] and an
accelerated version fast PFH (FPFH) [RBB09]. Both build upon the relations between
surflets, i.e. the combination of a surface point and its normal [WHH03]. Tombari et
al. propose a 3D descriptor (SHOT) based on the point normals that is defined in a
robust local reference frame [TSDS10].
5.1.2 Approach
We extract 3D descriptors based on the neighborhood of each point to describe the
surface properties of this point. Subsequently, we use the descriptors and the ground
truth from Subsection 2.6.1 in a machine learning setup to develop and train a model
which is then used to classify points represented by their neighborhood.
The descriptors introduced in Section 5.1.1 are all suitable for the description of at-
tributes related to surface topography although some of them were originally developed
for different purposes. Out of these, we use four state-of-the-art local descriptors that
exclusively describe the geometry of the surface without considering color. We select
68
5.1 3D Descriptor-based Point Cloud Segmentation
the descriptors following three criteria: (i) Is the descriptor local? (ii) Is the descriptor
working on the point cloud only, without rendering a depth map of the considered re-
gion internally? (iii) Is it rotation invariant and robust to clutter and noise? All four
descriptors meet the selection citeria. In the subsequent paragraphs we describe the
four selected descriptors.
Persistent point feature histograms - PFH: The persistent point feature histograms
(PFH) have been proposed by Rusu et al. [RMBB08]. PFH is a local descriptor for
the geometry of the neighborhood of a given point. For a given neighborhood around a
point, PFH encodes pairwise geometric relations between all surflets (a concept previ-
ously introduced by Wahl et al. [WHH03]) in the neighborhood. A surflet is defined as
the combination of the position vector and the normal vector of a point. The original
implementation uses angular differences between the normals based on a Darboux frame
and the Euclidean distance between the points, which results in a description of each
point pair that consists of four values (see Figure 5.2). In the current implementation
used by us1, the distance between the points is discarded, because it has shown to be
undiscriminative. The three angles which describe the geometric relation of two points
(i.e. their surface normals) are placed in a histogram which utilizes five divisions for
each of the dimensions. This results in a feature vector with 53 = 125 dimensions. The
PFH is invariant to rotation, translation and (when discarding the Euclidean distance
between the points) scale. As the comparison values are stored in a histogram, it has
a certain robustness to noise. Furthermore, the authors claim, that the PFH has a
robustness to the acquired scale (i.e. 3D resolution) of a scene. That might be true for
the kitchen scene used for evaluation in the paper, as it does not matter whether 10 or
100 points are sampled from a flat surface in the scene and subsequently populate the
histogram as described above. However, in our case, where finest detail is necessary to
model the difference between pecked surface and natural surface, we do not expect the
PFH to be robust to changes in 3D resolution of the acquired surfaces.
Fast Persistent point feature histograms - FPFH: The FPFH descriptor is a sim-
plified version of the PFH [RBB09].The simplification aims at faster computation of
the descriptor. While the PFH considers the pairwise geometric differences between
all points in the neighborhood of the point for which the descriptor is estimated (the
query point), the FPFH descriptor in a first step only considers the pairwise differences
between the keypoint and each neighbor. This descriptor is called SPFH (Simplified
Point Feature Histogram). This reduces computational complexity from O(nk2) to
O(nk), where n is the number of points for which the descriptor is calculated, and k
the number of considered neighbors. The loss (compared to PFH) of the information
of pairwise relations between the neighboring points is partly compensated by addition
of the keypoint’s SPFH with the weighted SPFHs of its neighbors. The weight is the
Euclidean distance between the query point and the neighbor. The three angular values
are used in the histogram, which has eleven bins. In contrast to the PFH, the features
are de-correlated, which results in a histogram of 3*11 = 33 feature dimensions.
Signature of Histograms of Orientations - SHOT: SHOT captures topologic infor-
1See http://pointclouds.org/documentation/tutorials/pfh_estimation.php
69
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
Figure 5.2: Angular features computed for a pair of points for the PFH descriptor.
Vectors u, v and w form the fixed coordinate frame. ↵,   and ✓ denote the calculated
angular features between the normals ns and nt.1
mation around a 3D point cloud by using a spherical support structure and a local
reference frame [TSDS10]. The sphere around the key point is divided in 32 equally
sized volumes. For each volume, a local 11 bin histogram is computed from the cosine
of the angles between the key point normal and the normal of each point in the current
volume. Finally, all histograms are concatenated resulting in a 32*11 = 352 dimensional
feature vector. In [TSDS10], SHOT is evaluated on point clouds of synthetic 3D models.
It is invariant to rotation and robust to clutter and noise. Salti et al. evaluate SHOT
and other descriptors like FPFH considering the influence of the reference frame and
the descriptor separately [STDS14]. They show that their reference frame performs well
with other descriptors.
3D Shape Context - 3DSC: Frome et al. propose the 3D Shape Context descriptor
as a local point descriptor enabling object recognition in range data [FHK+04]. The
motivations for the descriptor are cluttered and noisy scenes acquired by range scanners.
The evaluation of 3DSC is performed on synthetic 3D car models with added noise.
The descriptor is a straightforward 3D generalization of the well-known Shape Context
descriptor proposed by Belongie et al. [BMP02]. A spherical support volume is divided
radially and around the azimuth in bins. The occurrences of points in the divisions
are accumulated in a histogram. To provide rotation invariance around the azimuth,
the histograms are rotated in order to have a descriptor for each division being the
start division. The resulting histograms are concatenated. We use an 1980-dimensional
feature vector in our experiments.
The dense computation of 3D descriptors is computationally demanding. Hence, we
select a subset of our 3D data for the evaluation. We perform experiments on four scans
of our 3D point cloud dataset. See Appendix A for details on the 3D dataset. We use
the scans with the IDs 1, 14, 21 and 22.
Qualitative evaluation of the descriptors: The descriptors we investigate for our
task have originally been designed to work on scenes like a kitchen [RMBB08] or for
similarity estimation of synthetic 3D models [FHK+04]. Our task (surface classification)
is different to these typical tasks where large surface parts of the point clouds are close
1Image source: http://pointclouds.org.
70
5.2 Image-space-based Point Cloud Segmentation
to volumetric primitives. In our case, we have small structures in the stone (peck
marks) that we want to use for the distinction of natural rock surface and pecked
rock surface. Hence, our first look into the descriptors is qualitative. In this first
step, we aim at investigating the discriminative capability of each descriptor without
taking generalization ability into account. We train a classifier with a randomly selected
training set from one image and classify the remaining feature vectors with the model
obtained.
Cross-Validated Evaluation of Single Descriptors and Descriptor Combinations: We
evaluate the performance of single features and concatenated combinations. Randomly
selected feature vectors of point clouds 14 and 21 (see above) are the training set, the
two other point clouds as well as the remaining points from clouds 14 and 21 form the
test set. As we are operating on the full resolution point clouds (the small dataset
consisting of four of the 26 scans has more than 12 million points) we utilize only 2.5%
of point clouds 14 and 21 as training data. For validation we use repeated random
sub-sampling with 10 sub-samples [Rao05].
5.1.3 Results
Figure 5.3 shows the qualitative results of the experiments. Generally, we observe, that
the classifier trained on each of the local point cloud descriptors produces lots of regions
that are false positives, i.e. regions that are natural rock surface but are classified as
pecked areas. A closer look at the single descriptors results shows, that PFH detects
most of the pecked areas but has a large amount of false positives as well. FPFH
performs comparable in terms of false positives, but misses large pecked regions. SHOT
and 3DSC produce noisy results over the whole 3D scan, with 3DSC showing more high
frequency noise than SHOT.
The boxplots in Figure 5.4 reflect the 10 F1-values obtained in the quantitative
evaluation. We observe, that the Point Feature Histogram (PFH) clearly outperforms all
other descriptors with an F1-value around 0.45. The simplified descriptor FPFH is more
than 0.1 behind with an F1-value around 0.31. SHOT and 3DSC are both around an F1-
value of 0.25. The combination of descriptors with early fusion does not improve results
at all. Combinations containing PFH (the first three combinations) perform comparable
to PFH only, and the fourth combination (FPFH+SHOT) performs comparable to
FPFH alone. We assume that this results from the fact that all descriptors exploit
similar information, the geometric relations of points and the respective surface normals.
We avoided combining 3DSC due to its large dimensionality (1980 dims).
5.2 Image-space-based Point Cloud Segmentation
In this section, we investigate the problem of describing and classifying 3D surfaces ac-
cording to their topographic structure in high-resolution surface reconstructions. This
task is challenging for the following reasons: First, methods such as SfM produce densely
sampled reconstructions which may contain several millions of points. The dense ex-
71
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
Figure 5.3: Qualitative evaluation of the discriminative abilities of the employed 3D point
cloud geometry-based descriptors. The rows from top to bottom show the ground truth,
PFH, FPFH, SHOT and 3DSC. The right hand side shows details. We observe, that each
point cloud descriptor produces a large amount of false positives compared to the ground
truth in row 1.
72
5.2 Image-space-based Point Cloud Segmentation
Figure 5.4: Segmentation results with different local point cloud surface descriptors. The
left side shows the performance of single features, the right side shows the performance
of descriptor combinations. We observe, that the combinations yield only slight improve-
ments.
traction of a surface’s topography in 3D is a computationally demanding task that may
lead to unacceptable runtimes for many applications. Second, surfaces may be shaped
differently and exhibit a complex nonlinear geometry (curvature) at the global level
that has to be compensated prior to the analysis of the local topography.
The dense 3D descriptor extraction for experiments of the previous section took sev-
eral days to complete. Therefore it is not feasible for larger datasets. As a consequence,
we transfer the problem to image space. The highly detailed scans of surface areas we
use do not include self-occlusions (see Appendix A). Hence, it is possible to reduce the
data from 3D to 2.5D (plane with height information, i.e. depth map) without any in-
formation loss. The depth map and a derived gradient map can form input to standard
state-of-the-art 2D pixel-wise segmentation methods as the one described in Chapter 4
and can therefore build on long developed and highly sophisticated 2D features.
5.2.1 Related work
The analysis of surface topography is necessary in different application domains. Ap-
plications include surface matching [JH99], automated terrain classification for au-
tonomous vehicles, inspection of material quality in industrial applications (an impor-
tant fundament for this line of work are the “Birmingham 14” parameters for character-
izing three-dimensional surface topography [DSS92, DSS93, DSS94a, DSS94b]), analysis
of digital elevation models in geology (geomorphometric analysis, a recent overview can
be found in [HR08]), as well as the analysis of archeological artifacts [BMdA12, Sco12].
Surface topography (sometimes referred to as morphology) may be investigated at dif-
ferent scales. While inspection of material quality may be performed at a millimeter
scale or even below, geologic morphology analysis is usually performed at a scale of
73
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
meters or kilometers. While the data lives in different scales, morphological parameters
are in general invariant to scale. Morphological parameters have been defined at several
places in literature and depend on the application domain. A broad range of parameters
has been developed for geomorphometry [Lea11]. A categorization of parameters related
to archeology are presented in [BMdA12]. Independent of the application domain, the
parameters are comparable to a high degree. Further parameters are defined in [BJ03].
5.2.2 Descriptors
There are numerous 2D descriptors that can be utilized for this analysis. Additionally
to the 2D descriptors described in Subsection 4.3.2 we use the Histograms of Oriented
Gradients (HOG, [DT05]) and three simple and easy to compute features proposed
for the description of enhanced deviation maps in [ZS15]: (i) global histogram shape
(GHS), (ii) tiny image (TI), and (iii) spatial frequencies (SF). All features are defined for
a local neighborhood. The size of the local window is 128x128px which is equivalent to
10.8x10.8mm2 (see Section 2.5 for details on orthophoto generation). GHS is computed
from a global histogram of all deviation values in a given neighborhood. To capture
solely the shape of the histogram, we extract the first 30 discrete cosine transform
(DCT) coefficients from the histogram and use them as a feature vector to describe the
global value distribution in the local neighborhood. TI is simply a down-scaled 8x8
pixel version of the current neighborhood and is inspired by [KJPK02] who have shown
that the raw pixel values yield surprisingly good performance for 2D texture analysis
when used as features. The third feature, SF, contains the first 8x8 low frequency 2D
DCT coefficients of a given image block. SF actually captures the same information as
TI in a different mathematical domain (frequency domain). We employ both features
to evaluate which representation suits topography classification better. While GHS
represents spatial invariant information, TI and SF capture mostly spatial information.
Thus both types of features are complementary which makes them good candidates for
combination.
5.2.3 Approach
A depth map encodes global (the curvature of the ideal surface as described at the
beginning of the chapter) as well as local (the high frequency deviations of the surface
from the ideal surface) curvatures of the surface. Figures 5.5(a) and 5.5(b) show a point
cloud and the corresponding depth map color coded as heat map. We observe that the
global curvature of the point cloud has larger differences in values than the local depth
differences resulting from the pecked figures. For the distinction of pecked and natural
parts of the rock surface we are interested in high frequency local curvatures and have
to be robust against the global curvature. To accomplish this we follow two strategies:
a) Local feature extraction: We extract features in a block around the pixel we
want to classify as natural rock or pecked rock. We normalise the feature values and
hence compensate for the global height of the block. But, there is one thing we cannot
compensate with local feature extraction: A global inclination of the surface patch
74
5.2 Image-space-based Point Cloud Segmentation
which is the basis for the depth map block used. We assume, that different features
may be robust against such constant gradients while others rely strongly on gradients.
Hence, we employ different features and feature combinations in the experiments.
b) Global curvature compensation and local curvature enhancement: In order to
be robust against the global curvature resulting in different inclinations of the surface
patches for feature extraction we compensate the global curvature by subtraction of a
Gaussian-filtered version of the depth map from itself. The local curvature is enhanced
by several image processing steps. The process from the depth map to the enhanced
depth map has been proposed and evaluated by Zeppelzauer and Seidl in [ZS15]. Fig-
ure 5.5 illustrates the topography extraction and enhancement on a surface from our
dataset. The point cloud viewed from projection direction together with its depth map
is shown in Figures 5.5(a) and 5.5(b). The deviation map in Figure 5.5(c) compensates
the global depth variations and reveals the micro-structure of the surface. The corre-
sponding enhanced deviation map in Figure 5.5(d) improves basic topographic patterns,
such as valleys in this example and yields an expressive representation of the surface
topography.
5.2.4 Experimental setup
The experiments aim at comparing the full 3D point-based approach that relies on
different local 3D descriptors extracted directly from the point cloud described in Section
5.1 with a pixel classification task on depth maps and derivatives thereof. First, we
compare the full 3D approach with different 2D input representations (e.g. depth maps,
color images, enhanced depth map) and several state-of-the-art 2D features extracted
from them. Additionally, we employ the three features described in Subsection 5.2.1
(GHS, TI, SF) extracted from the enhanced deviation map.
The employed dataset contains 26 surface reconstructions with a total number of
115.3M points (see Appendix A for details on the 3D dataset). The number of points
per surface ranges from 810k to 10.6M. For each surface a precise ground truth has been
generated by domain experts that labels all engravings on the surface (see Subsection
2.6.1). The dataset contains two classes of surface topographies: class 1 represents en-
graved areas and class 2 represents the natural rock surface. Class priors are imbalanced.
Class 1 represents 20.9% of the data and is thus underrepresented.
In a first step, we perform all experiments on the four scans used in the full 3D
evaluation in Section 5.1. Then we employ the most promising features on the full
dataset.
For the representation of the input point cloud, we investigate four different visual
representations: (i) A color image of the point cloud obtained by orthographic projection
of the point colors. The point colors are obtained by shading the point cloud with
oblique illumination. This makes topographic patterns better visible in the resulting
texture of the image. (ii) The depth map obtained by orthographic projection. (iii) The
gradient map which is the local gradient of the depth map. The gradient emphasizes
local depth variations and thus captures structures related to surface topography. At
the same time it neglects global surface variations. It has been successfully employed
75
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
(a) Visual appearance. (b) Depth map.
(c) Deviation map. (d) Enhanced deviation map.
Figure 5.5: A 3D surface reconstruction of a rock surface in which the shape of a human
figure has been engraved. (a) The colored point cloud; (b) The projected depth map; (c)
The deviation map; (d) The enhanced deviation map accentuates well the different surface
topographies of the engraving and the surrounding rock surface. Please refer to [ZS15] for
details on the process from depth image to enhanced deviation map.
76
5.2 Image-space-based Point Cloud Segmentation
for the segmentation of fine structures in 3D scenes [WFFD11]. (iv) The enhanced
deviation map from [ZS15].
From all four representations, we compute the following features: MPEG-7 Edge
Histograms (EH) [II02], two variants of Local Binary Patterns: uniform LBP (uLBP)
and rotation invariant LBP (riLBP) [OPH96, OPM02], Scale-Invariant Feature Trans-
form (SIFT) descriptors [Low04], Histograms of Oriented Gradients (HOG) [DT05]
and Gray-Level Co-occurrence Matrices (GLCM) with derived parameters specified
in [HSD73, ST99, Cla02]. Additionally, we extract the features proposed in Subsec-
tion 5.2.1 (GHS, TI, and SF) as well as two combinations (GHS+TI and GHS+SF)
which are formed by concatenating the individual feature vectors. All features are ex-
tracted for local windows of 128x128 pixels (10.8x10.8mm2) with a step size of 16 pixels
(1.35mm).
For classification we employ SVMs (with linear and RBF kernel) [CV95], and random
undersampling boosting (RUSBoost) which is a variant of AdaBoost [FS+96] that is
especially designed for imbalanced class distributions [SKVHN08, SKVHN10].
We split the entire dataset into two sets: a set of training surfaces and a set of test
surfaces. Training sets are randomly selected from the training surfaces. We apply 5-fold
cross-validation and grid search on the training sets to determine the classifiers’ param-
eters (C for linear SVM, C and   for SVM with RBF kernel, and the number of learners
for RUSBoost). The classifiers with optimized parameters on the cross-validated train-
ing set are then applied to the previously unseen test surfaces. We compute recall and
precision for both classes and report the F1-score of the underrepresented class for each
experiment. As class 2 covers nearly 80% of the surfaces’ area, the F1-score of the
underrepresented class (class 1) is most expressive to assess the overall classification
performance. Each experiment is repeated 10 times with different randomly selected
training sets from the training surfaces to obtain a distribution of F1-scores for each ex-
periment. We report the median F1-value, as well as the interquartile range (iqr) of the
distribution. To judge the performance difference between two experiments, we apply
Fisher’s randomization test [Fis35, SAC07] as well as Student’s paired t-test [BHH78]
with a significance level of 0.05, i.e. in all experiments a p-value 0.05 is considered
statistically significant.
5.2.5 Results
We perform a systematic evaluation of all employed 2D and 3D representations and
features.
5.2.5.1 Small dataset
As the number of experiments is large, we first use the four heterogeneous surfaces
used in Section 5.1 with the full 3D approach. Training data is randomly selected
from the first two surfaces. The remaining two surfaces form the test set. The results
in Table 5.1 (calculated from the results in Subsection 5.1.3) show that PFH is the
most suitable feature for our task. PFH outperforms the other descriptors significantly
77
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
Table 5.1: Results of the local 3D descriptors with linear SVM. Numbers provide the
median F1 as well a the inter quartile range (iqr) in brackets. The PFH descriptor best
captures the topographic surface characteristics, however, at the cost of considerable com-
putation time.
Feature > PFH FPFH SHOT 3DSC
Repres. _ F1-median (iqr) F1-median (iqr) F1-median (iqr) F1-median (iqr)
3D point c. 0.441 (0.020) 0.316 (0.005) 0.264 (0.011) 0.253 (0.016)
(p-value: 0.016). PFH represents a global statistic of the normal orientations by con-
sidering all points in a local neighborhood in a pair-wise manner [RC11]. FPFH is
a simplified (and much faster) variant of PFH which operates on a smaller subset of
pairwise comparisons. This, however leads to a coarser representation of the surface
topography which negatively affects performance. The remaining features (SHOT and
3DSC) show poor performance. In contrast to PFH, SHOT and 3DSC encode spatial
information from the local neighborhood which shows to be less-suited for topography
description.
Then, we evaluate topography classification in 2D on the four representations used
in Section 5.1 to compare the results with the full 3D results. Table 5.2 shows the
results of all 2D features with all input representations with a linear SVM. Comparing
the results with the 3D results in Table 5.1 we observe, that the best feature in the
3D approach performs comparably well with the color image-based segmentation but
is outperformed by all representations of the point cloud considering depth. Hence, we
do not consider the full 3D approach in the remaining section but concentrate on the
image space methods.
In Table 5.2 we observe significant performance differences between the four input
representations. The classification based on the color image yields the weakest results.
The best feature is SIFT with a median F1 of 0.495 followed by SF and both LBP
variants. The improvement of SIFT over SF is however not significant (p-value=0.56).
Topography classification on the depth map yields an improvement of 5.4% for the best
feature (SF with a median F1 of 0.549). This performance differences between depth
map and color image are, however, not significant. This applies also to the gradient map
and the depth map. The best feature on the gradient map (riLBP, median F1=0.549),
however, significantly outperforms all other features on the color image as well as all
features on the gradient map (in both cases with a p-value<0.001).
Experiments on the enhanced deviation maps (EDM) show that the first 6 features
in Table 5.2 cannot benefit from the enhanced representation. GLCM benefits most
from the enhanced representation and is significantly better on EDM than on all other
representations (p-value=0.023). The three proposed features GHS, TI, and SF yield
similar performance as the top performing features on the other representations. GHS
yields the best performance in terms of median F1 so far (0.589). This is, however,
not significant, as the feature shows a strong variation in performance (iqr of 0.447). A
78
5.2 Image-space-based Point Cloud Segmentation
Table 5.2: Results of the systematic evaluation of all input representations and features
with linear SVM on the small dataset (the four point clouds on which the full 3D approach
has been evaluated in Section 5.1). Numbers provide the median F1 as well as the inter
quartile range (iqr) in brackets. 1 denotes that the result is significantly better than all other
results for this feature, i.e. best result of a row. 2 denotes that the result is significantly
better than all other results for this representation, i.e. best result of a column. GHS+SF
applied to the EDM significantly outperform all other features in the experiments.
Repres. > Color image Depth map Gradient map Enh. dev. map
Feature _ F1- median (iqr) F1- median (iqr) F1- median (iqr) F1- median (iqr)
EH 0.301 (0.002) 0.271 (0.000) 0.364 (0.001)1 0.286 (0.002)
riLBP 0.462 (0.005) 0.491 (0.006) 0.549 (0.005)1,2 0.290 (0.001)
uLBP 0.454 (0.014) 0.413 (0.073) 0.304 (0.319) 0.019 (0.022)
SIFT 0.495 (0.041)1 0.269 (0.003) 0.345 (0.062) 0.424 (0.018)
HOG 0.354 (0.014) 0.451 (0.009)1 0.414 (0.007) 0.351 (0.002)
GLCM 0.389 (0.134) 0.271 (0.000) 0.316 (0.044) 0.501 (0.184)1
GHS 0.192 (0.169) 0.214 (0.077) 0.321 (0.180) 0.589 (0.447)
TI 0.399 (0.042) 0.476 (0.052) 0.273 (0.001) 0.522 (0.001)
SF 0.481 (0.033) 0.549 (0.251) 0.279 (0.005) 0.521 (0.001)
GHS+TI 0.212 (0.158) 0.205 (0.083) 0.321 (0.180) 0.682 (0.104)1
GHS+SF 0.422 (0.034) 0.333 (0.002) 0.356 (0.037) 0.745 (0.024)1,2
79
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
strong gain in performance is obtained when the features are combined. Both combi-
nations GHS+TI and GHS+SF significantly outperform all other features on the EDM
as well as all other features on the remaining representations (all p-values<0.047). The
peak performance is obtained by GHS+SF with a median F1 of 0.745.
The strongly varying performance of the different features has several reasons. The
enhanced deviation map is a comparably smooth representation without sharp edges
(due to Gaussian filtering). Thus, features that rely on gradient orientations, such
as HOG, SIFT, EH can hardly benefit from the novel representation. The GLCM
that takes the absolute values of the map as input (instead of differences or gradients)
benefits most from the enhanced deviation map compared to the other representations.
We observe the same behavior for GHS and TI which both rely on absolute values. SF
performs similar to TI because they actually represent equivalent information. The high
performance of the combined features (GHS+TI and GHS+SF) has two reasons: First,
both features perform well when applied separately; Second, both features represent
complementary information. GHS represents the shape of the value distribution and
thus is a global and rotation invariant descriptor of an image block. TI and SF, in
contrast, represent the spatial distribution of values and thus add a lot of discriminatory
information to GHS.
We repeat all experiments with an SVM with RBF kernel. The nonlinear SVM
achieves lower training errors than the linear SVM which shows that RBF is able to
better fit to the training data. During testing, however, no significant performance
gain is achieved by the RBF kernel. The best feature is again GHS+SF on EDM
with a median F1 of 0.737. We assume that overfitting prevents a further increase of
performance.
We perform additional experiments with RUSBoost and summarize results in Ta-
ble 5.3. RUSBoost improves performance in 40 of the 44 experiments shown in Table 5.3.
RUSBoost better models the data and yields a higher generalization ability than the
SVM in our experiments. The weakest representation shows to be color. For the depth
and gradient map, however, a new peak performance is obtained. HOG on the depth
map improves by 22.9% to 0.680 and riLBP yields a median F1 of 0.602 (+5.3%).
The best overall results are again obtained for the enhanced deviation maps and the
proposed features. RUSBoost significantly improves the results for the individual fea-
tures (GHS, TI, and SF) over SVM with p-values<0.001. Finally, RUSBoost trained on
GHS+TI on EDM significantly outperforms all other experiments performed so far with
a median F1 of 0.768 (p-value<0.001). Apart from the improved overall performance
which indicates a better generalization ability, RUSBoost yields a much lower spread
in performance (maximum iqr 0.071) compared to SVM (maximum iqr 0.447). This
demonstrates a high robustness of RUSBoost to different training data selections.
5.2.5.2 Full dataset
We finally evaluate the proposed method for the full dataset containing all 26 surface
reconstructions with a total of 115.3M points. We employ the first 10 surfaces for train-
ing and the remaining 16 for testing and pick training sets again randomly from the
80
5.3 Conclusion
Table 5.3: Results of the systematic evaluation with RUSBoost (same schema as Ta-
ble 5.2).
Repres. > Color image Depth map Gradient map Enh. dev. map
Feature _ F1- median (iqr) F1- median (iqr) F1- median (iqr) F1- median (iqr)
EH 0.345 (0.004)1 0.000 (0.000) 0.323 (0.013) 0.334 (0.004)
riLBP 0.517 (0.007) 0.653 (0.004)1 0.602 (0.006)2 0.331 (0.003)
uLBP 0.400 (0.016) 0.550 (0.028)1 0.495 (0.005) 0.367 (0.008)
SIFT 0.386 (0.010) 0.243 (0.002) 0.431 (0.022) 0.472 (0.005)1
HOG 0.442 (0.007) 0.680 (0.006)1,2 0.383 (0.005) 0.368 (0.005)
GLCM 0.397 (0.021) 0.000 (0.000) 0.391 (0.012) 0.656 (0.003)1
GHS 0.398 (0.007) 0.517 (0.007) 0.358 (0.064) 0.690 (0.004)1
TI 0.513 (0.006) 0.569 (0.011) 0.391 (0.048) 0.729 (0.003)1
SF 0.488 (0.004) 0.534 (0.008) 0.345 (0.005) 0.689 (0.002)1
GHS+TI 0.434 (0.003) 0.598 (0.007) 0.454 (0.071) 0.768 (0.004)1,2
GHS+SF 0.411 (0.008) 0.521 (0.011) 0.455 (0.010) 0.744 (0.003)1
training surfaces. We apply RUSBoost with 70 learners on the features that performed
best so far (riLBP, HOG, GLCM, GHS, TI, SF, GHS+TI, and GHS+SF). Figure 5.6
shows boxplots of the results for all features on all representations. We observe that the
overall results decrease which is not surprising, as the entire dataset contains 15 times
more data than the subset evaluated so far which increases the complexity of the classi-
fication task. The individual results, however, show similar behavior as in the previous
experiments. Again the color image is the weakest representation and riLBP and HOG
yield the best performance on the depth and gradient maps. The enhanced depth map
again yields a significant improvement of the overall performance. Only the proposed
features yield F1-scores above 0.6. The best result is obtained by GHS+TI (median
F1=0.637) which significantly outperforms all other features with p-values<0.030. The
results on the entire dataset demonstrate that the enhanced deviation map outperforms
the other representations and expresses well the topography of the additional surfaces
in the full dataset. The proposed features are most effective in combination with the
enhanced deviation maps and best capture the topographic information of the surfaces.
5.3 Conclusion
In this chapter we investigated the segmentation of rock art 3D scans in natural rock
surface and pecked rock surface based on the micro-topography of the surface. We eval-
uated the performance of state-of-the-art local 3D surface descriptors in comparison
with image-space representations of the point clouds on our task of rock art segmen-
tation. A comparison of the results obtained with 3D point cloud descriptors with the
results on orthographic images with and without the utilization of depth on the same
81
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
f1
riL
B
P
H
O
G
G
LC
M
G
H
S
T
I
S
F
G
H
S
+
T
I
Color image Depth map         Gradient map       Enh. dev. map
G
H
S
+
S
F
riL
B
P
H
O
G
G
LC
M
G
H
S
T
I
S
F
G
H
S
+
T
I
G
H
S
+
S
F
riL
B
P
H
O
G
G
LC
M
G
H
S
T
I
S
F
G
H
S
+
T
I
G
H
S
+
S
F
riL
B
P
H
O
G
G
LC
M
G
H
S
T
I
S
F
G
H
S
+
T
I
G
H
S
+
S
F
Figure 5.6: Results of the systematic evaluation with RUSBoost on the complete dataset
with 26 3D scans. The enhanced deviation map yields again a significant improvement of
performance.
82
5.3 Conclusion
dataset shows, that the best performing point cloud descriptor PFH is comparable with
the segmentation based on RGB orthographic images and is outperformed by the best
results on the non-enhanced depth image, the depth map (DM). The image space re-
sults on the gradient map (GM) and the novel enhanced depth map (EDM) are clearly
better than the results achieved with PFH.
Our experiments reveal two key findings: First, surface topography classification
on the point clouds directly using local 3D descriptors performed worse than expected
which suggests that existing descriptors do not capture surface topography well. Second,
using image space methods, an expressive representation is more important than a
sophisticated feature for topography classification. In our experiments the simplest
features achieved the best results on the enhanced maps.
We conclude that in terms of micro-topographic surface description, 3D is not here1.
A significant amount of future work has to be done in the development of 3D descriptors.
Especially more efficiently extractable features are needed for a feasible employment of
3D descriptors.
1We refer to Rusu and Cousins who titled their paper on the point cloud library "3D is here ..."
[RC11].
83
5. SEGMENTATION OF PETROGLYPH 3D POINT CLOUDS
84
Similarity of Pecking Styles
Figure 6.1: A part of a petroglyph depicting a human with a superimposed arrow con-
taining different pecking styles.
In Chapters 4 and 5 we analysed the rock surface aiming at a distinction of pecked
and non-pecked areas. In this chapter, we use the segmentation results to look at
the pecked regions aiming at a distinction of different pecking styles, that is different
depth, shape, size and spatial distribution of peck marks (see Figure 6.1). What are the
reasons for different peck marks? The two main influences on peck marks are time and
the manufacturing process. First, depending on the type of the rock, the time passed
since the manufacturing of the pecking and the environmental conditions, the rock
surface shows different types and amounts of abrasion. This abrasion alters the peck
marks, it makes them less deep and smooths away sharp edges. Hence, the abrasion can
be an indicator for the time passed since the manufacturing of the pecking, given that
data about the past environmental conditions exists. Second, the process of creation
of rock art can cause many different peck marks. Influences on the peck marks are a)
the tools used: Stone or metal, type of stone or metal, directly pecked or with a second
85
6. SIMILARITY OF PECKING STYLES
object used as hammer?; b) the author him- or herself: Right-handed or left-handed,
strong or weak, sitting or standing, in a comfortable position while working, intention
to make deep or shallow peck marks?
The traditional way to estimate the properties of the manufacturing process is based
on experimental archaeology. The hypotheses about the tools and other properties of
the peck marks under investigation are checked by performing the creation process on a
comparable rock (ideally a piece of rock from the locality of the investigated peck marks)
and a subsequent visual comparison of the original peck marks with the experimentally
created peck marks. For this process, an automated similarity estimation of the peck
marks based on high-resolution 3D scans can be supportive. However, this is only
feasible with a small number of figures. The number of parameters and variations thereof
creates a large possibility space of plausible author/tool/intention combinations. Hence,
it is not feasible to create experimental examples for large amounts of petroglyphs, as
for instance in Valcamonica where up to 300.000 figures have been hammered into
the rock. Besides the estimation of tools and other properties of the manufacturing,
the second archaeological motivation for the investigation of peck marks is the study
of superimposition which can help with relative timing. Many figures are pecked over
other figures, causing merged figures (see Figures 6.2 and 6.3). The determination of the
figure that was pecked last, i.e. over the other figure, can be supported by investigating
the pecking style. The figure that has the same pecking style as the overlapping area
of the two figures is supposed to have been pecked later (see Subsection 2.3.2).
The amount of figures with superimpositions in our 3D dataset is small and we
do not have any ground truth for different pecking styles. Hence, in our approach we
investigate whether different pecking styles can be distinguished based on numerical
descriptions. As there is no ground truth, we use unsupervised methods (clustering) on
point-wisely extracted 3D surface descriptors and visualise the results by coloring the
points of the point cloud.
6.1 Related Work
Zotkina et al. conducted an experimental study of percussion and tool types in the
Minusinsk basin in Siberia [ZTKP14]. They replicated different peck marks with a
diverse set of tools. Moitinho et al. approached a numeric analysis of pecking styles
using terminology and software from geomorphology [dARBP13]. They aim to estimate
the tools and gestures used for creating rock art. The approach includes experimen-
tal archaeology to replicate ancient peck marks. They scan the surfaces with 0.28mm
resolution. From the scans, they use patches with 50x50mm2 containing homogeneous
peck marks to extract surface metrology parameters with proprietary software. They
cluster the numeric descriptions of the peck marks and observe three clusters which
contain surfaces with the following properties: 1) fine, 2) coarse and regular, 3) coarse
and irregular. They do not observe clusters linked to the manufacturing process of the
samples. In contrast to Moitinho et al. we approach peck mark similarity estimation
with 3D surface descriptors which have been developed for registration and object detec-
86
6.1 Related Work
Figure 6.2: A warrior figure containing two visually clearly distinguishable pecking styles.
We observe that the large arrow from left to right has been pecked over the warrior with
clearly larger peck marks. This figure serves as a good basic test case whether 3D surface
descriptors can help to automatically distinguish between pecking styles.
87
6. SIMILARITY OF PECKING STYLES
Figure 6.3: A scene containing at least two clearly distinguishable pecking styles.
88
6.2 Approach
tion. To our knowledge, there is no work on automated peck mark similarity estimation
comparable to ours.
6.2 Approach
We follow two approaches to investigate peck style variety based on the point cloud data.
First, we densely sample the point cloud, i.e. we compute a descriptor for each point
of the cloud based on its neighborhood (see Subsection 6.2.1). The second approach
(see Subsection 6.2.2) computes a topology description of a point neighborhood not
for all points, but only for relevant points. For this purpose, we use the information
about peak positions which we determine by the detection of local maxima in the
orthographic depth map. In both cases, we cluster the descriptors and visualize the
results. As we have millions of descriptors in up to 125 dimensions (in the case of
PFH) we use a partitioning clustering method, namely k -means. k -means allows to
cluster large datasets in reasonable time. In comparison to other clustering methods
the disadvantage of k-means (and other partitioning methods) is the necessary selection
of a value for k, i.e. the number of clusters. Our samples contain at most 3 visually
distinguishable pecking styles. We cluster all our samples with 2,3,4 and 5 clusters to
allow over-segmentation and to possibly detect visually not observable clusters.
6.2.1 Dense sampling
Based on the results of Section 5.1 we employ PFH and FPFH (Fast Persistent Point
Feature Histograms) point cloud descriptors. PFH performed best while FPFH provided
the best trade-off between segmentation performance and computation time. Please
refer to Section 5.1 for details on 3D point cloud descriptors.
We extract PFH and FPFH densely and vary the size of the neighborhood considered
for descriptor computation. We cluster the resulting descriptors in 2,3,4 and 5 clusters
and perform visual inspection to assess whether the descriptor allows to model peck
style similarity well.
6.2.2 Topology-based sampling
The dense approach presented in the previous section has drawbacks. First, the compu-
tational complexity of dense descriptor calculation is high. Second, the descriptors are
highly redundant, as typical diameters of the marching sphere we use for the description
of a point neighborhood are between 4 and 8 mm whereas the point clouds we use are
rasterized with a grid size of 0.1 mm.
An alternative approach is to restrict analysis to salient points that can serve as
window centers for surface description. Meaningful salient points for the description
of surface texture are peaks and valleys. Figure 6.4 shows a surface part with peaks
marked red. Please note, that we refer to peaks as maxima in the depth map values.
The peaks are actually valleys if we look at the 3D surface as the detector finds peaks
89
6. SIMILARITY OF PECKING STYLES
Figure 6.4: Detail of a point cloud with points at peak positions marked red. Please note,
that we refer to peaks with respect to the detection method. The peaks are actually valleys
if we look at the 3D surface as the detector finds peaks at local maxima in the depth map.
Figure 6.5: Spherical descriptor footprint with a radius of approx. 2mm. We observe
that the peck mark is completely within the footprint.
90
6.2 Approach
at local maxima in the depth map and the depth has a value of zero at the image plane.
We observe, that almost each peck mark has a point marked as peak at its lowest point.
Hence, we assume that these points are good center points around which we can describe
the surface properly.
FPFH describes the local neighborhood of a point based on the angular differences
between the surface normals of the point and its neighbors (see Figure 5.2). We propose
a new descriptor TopoFeat based on the Euclidean distance between surface points and
the angular differences of their normals. PFH is proposing the usage of the pairwise
Euclidean distances between the surfels around the query point, too. But this is not
included in the reference implementation due to performance reasons.1 However, the
main difference between PFH and the TopoFeat descriptor is the fact, that we are not
considering pairwise differences of all points in the neighborhood of the query point but
only the pairwise differences between the query point and all points within the descriptor
footprint. The TopoFeat descriptor is constructed as follows:
1. Select a query point (the point we want to construct the description for) and a
radius r for the neighborhood that should be considered.
2. Select all points in the neighborhood.
3. Calculate the Euclidean distance and the angular difference of the normals be-
tween the query point and each point in the neighborhood. The resulting feature
vector contains a quadruple for the difference between the query point and each
point within the footprint (see Figure 6.5) consisting of three angular differences
and the Euclidean distance between the two points (see Figure 5.2).
4. Sort the quadruples by the Euclidean distance.
5. Select the number of concentric shells m the spherical descriptor footprint should
be divided in.
6. Divide r evenly in m parts.
7. Put each quadruple in one of the m parts according to its Euclidean distance
to the query point and discard the distances. This results in m sets of triples
containing the angular differences between the query point and all points in one
of the m concentric shells.
8. Select a number of bins l for a 3D histogram.
9. For each of the m parts, aggregate the values of the triples in the 3D histogram.
This results in a 3D histogram with l3 bins for each of the m concentric shells.
The m histograms contain information about the gradients of the surface in each
of the m shells. The resulting feature vector has a size of m⇥ l3.
We do not extract the TopoFeat densely for each point but only at points that are at
the bottom of peck marks (see Figure 6.5). We assume, that the surface description
is sensitive to the size of the peck mark(s) within the descriptor footprint for which it
is computed. Hence, we assume that the descriptor allows us to distinguish different
pecking styles.
1See http://pointclouds.org/documentation/tutorials/pfh_estimation.php.
91
6. SIMILARITY OF PECKING STYLES
We extract the proposed TopoFeat at peak positions (i.e. the lowest points in peck
marks, see Figure 6.5) and cluster the resulting features in 2,3,4 and 5 clusters and
perform visual inspection to assess whether the descriptor allows to model peck style
similarity well.
6.3 Results
In this section we present qualitative results of peck style clustering based on 3D descrip-
tion of the surface. Due to the computational complexity and the preliminary status
of point cloud-based peck style analysis we employ nine 3D scans for our evaluation,
namely scans 2, 3, 8, 9, 10, 19, 21, 24 and 26 (see Appendix A for a description of the
dataset). We chose the set of scans with the aim to include figures that are pecked
homogeneously and figures that are pecked in visually different pecking styles.
6.3.1 Dense sampling
We extract the FPFH descriptor from all nine scans with radii of r=1, 2 and 5 mm.
Due to its computational complexity we extract PFH from six scans (3, 8, 10, 19, 21
and 24) which we select out of the nine. The selection criterion is the quality of the
clustering of the FPFH descriptors. We choose the three point clouds for which we
achieved good results and the three point clouds for which we achieved bad results. We
use radii of r=1 and 2 mm as descriptor computation with r=5mm did not terminate
in a reasonable time. We show examples for good and for bad results.
Figures 6.2 and 6.3 show two of the point clouds we use for qualitative analysis
containing distinguishable pecking styles. The warrior in Figure 6.2 contains two vi-
sually easily distinguishable pecking styles which are in clearly distinguishable regions.
Furthermore, the head of the warrior shows a peck mark density between the two styles
of the body and the superimposed arrow. We assume that this scan can serve as a basic
test whether 3D descriptors can distinguish different pecking styles. The second scan
we employ (see Figure 6.3) poses a harder challenge. The pecking styles in the figure at
the upper left part of the scan are interwoven, possibly as result of superimposition of
large pecks over a finely pecked warrior figure. Figure 6.6 shows a point cloud with a
homogeneous pecking style. Figure 6.7 shows the results of pecking style clustering with
2, 3 and 4 clusters. We observe that the clustering with two clusters (upper right figure)
separates the body and the arrow while being ambiguous for the head and the foot at
the lower left of the figure. The ambiguity results in speckled regions. The results with
three and four clusters do not improve the separation of the differently pecked regions.
Figure 6.8 shows the results of pecking style clustering with 2 and 3 clusters as well
as a detail of the warrior figure in the scene. In the clustering with two clusters (upper
right) we observe that the superimposition over the warrior is clearly distinguishable
while the distribution of the two clusters in the other parts of the scan remains unclear.
The clustering with three clusters (lower left) shows a comparable distinction for the
warrior figure, the green cluster in the clustering with two clusters is split into a green
92
6.3 Results
Figure 6.6: A petroglyph with homogeneous pecking style.
93
6. SIMILARITY OF PECKING STYLES
Figure 6.7: Visualization of clustering results of densely extracted FPFH descriptors with
r=5mm. The figure in the upper left is the original point cloud, the other three figures are
the results with 2, 3 and 4 clusters respectively. We observe that the clustering with two
clusters is modeling the two visually observed pecking styles well. The clusterings with 3
and 4 clusters do not show improvements.
94
6.3 Results
and a blue cluster. The details at the lower right expose that the superimposition over
the warrior is modeled very well in both cases.
Figure 6.9 shows an example for a petroglyph with homogeneous pecking style.
We observe that the clustering algorithm clusters the data in the required number of
clusters, as it is a property of k -means to always create a clustering with the requested
number of clusters. But, we observe that the clustering shows a pattern comparable to
vertical stripes.
Figure 6.10 shows a comparison of the two descriptors (rows) we use and different
footprints (columns) for descriptor extraction. We observe that the larger the footprint
(it increases from left to right) the better the results we achieve. The two pecking styles
are separated optimally with FPFH with r=5mm. We assume that the very large peck
marks the superimposition is consisting of are described well with descriptor footprints
with a diameter of 1cm as the best result is surprisingly achieved with FPFH which
contains only a fraction of the surface topology information compared to PFH which
we couldn’t extract with the largest footprint due to computational complexity. Please
refer to Subsection 5.1.2 for a detailed description of the difference between PFH and
FPFH.
Figure 6.11 shows a surface detail. We observe that the flatter surface areas with
shallow peck marks and the rougher surface areas with large peck marks are separated
well.
6.3.2 Topology-based sampling
We extract the topology feature with a radius r=2mm from all nine scans. We show
examples for a good and for a bad result.
Figure 6.12 shows the result of the computation and clustering of our TopoFeat.
We observe that the distinction between body and arrow is modeled well. This is
especially remarkable, as the number of peak positions used for feature extraction is
around one percent compared with the number of points used for dense extraction.
Thus the computational demand for describing and clustering the pecking styles is
roughly only ne percent compared to the dense approach. The main restriction we
faced during our experiments was the computational demand of 3D descriptors. Hence,
our results with TopoFeat demonstrate that the utilization of peak positions as salient
points clearly improves the feasibility of using a 3D point cloud for similarity estimation
of pecking styles.
Figure 6.13 shows the result of another example using our proposed TopoFeat. We
observe that the superimposition is only distinguishable partly. This scan which has
been separated well with the dense approach (see Figure 6.8) shows the limitations of
the current status of our TopoFeat.
95
6. SIMILARITY OF PECKING STYLES
Figure 6.8: Visualization of clustering results of densely extracted FPFH descriptors
with r=2mm. The figure in the upper left is the original point cloud, the upper right figure
shows the result of clustering with two clusters while the figure at the lower left shows
the result with three clusters. The two details at the lower right of the figure show the
central part of the warrior in the scene. We observe that the superimposition on the figure
is clearly distinguishable in the two and three cluster case.
96
6.3 Results
Figure 6.9: Visualization of clustering results of densely extracted FPFH descriptors
with r=2mm. We show results with 2,3,4 and 5 clusters. We observe that the clustering
results in vertical stripes that are assumably not caused by the surface topology of the
homogeneous pecking style of the petroglyph (see Figure 6.6).
97
6. SIMILARITY OF PECKING STYLES
Figure 6.10: Comparison of different descriptors and footprints. The first row shows
clustering results based on the FPFH descriptor with r=1,2 and 5mm (from left to right).
The second row contains clustering with PFH with r=1 and 2mm (from left to right). We
observe, that this scan requires a large descriptor footprint as the best result is FPFH with
r=5mm. The original scan is shown in Figure 6.2.
Figure 6.11: A surface detail of a point cloud the points of which have been clustered into
two clusters based on PFH with r=2mm. We observe that the flat regions in the surface
are in one cluster while the deep peck marks are in the other cluster.
98
6.4 Conclusion
(a) Two clusters. (b) Three clusters.
Figure 6.12: Warrior with surface spheres around peaks serving as basis for topological
feature extraction. We observe, that the different pecking styles are modeled well (See
Figure 6.2). The third cluster (green) in b) is very small. Please note, that the circular
region colored around a peak has only a quarter of the radius that is used as query point
neighborhood for feature calculation. We reduced the radius for visual clarity. Overlapping
areas of differently colored circles show a mixture of the colors, e.g. yellow in the case of
red and green in Fig b).
6.4 Conclusion
We have investigated the similarity of pecking styles using different types of sampling
(dense and topology-based). We have shown that 3D surface descriptors can model
pecking styles well in several cases. In other cases, the pecking style clustering fails.
The main restriction is the computational demand which prevents extensive exper-
imentation. Hence, future work should focus on the description of the surface around
salient points as this approach reduces the computational demand drastically. Addi-
tionally, in our experiments we experienced that the evaluation of pecking style clus-
tering is particularly challenging due to the absence of ground truth information. An
unsupervised clustering can only be as good as the underlying descriptors. Without
additional knowledge about the searched patterns (pecking styles) it is not possible to
select adequate parameters for the clustering and to remove irrelevant or even mislead-
ing parameters which can corrupt the clustering easily. In an unsupervised setting the
selection of adequate parameters remains a task that has to be performed manually by
the user and hence prohibits systematic evaluation.
We recommend the use of a semi-supervised approach in future, where the user
provides hints to a machine learning system. In this way expert knowledge can be
integrated and a suitable set of parameters for a given pecking style can be selected in
a supervised fashion. This could facilitate the generation of computational models for
99
6. SIMILARITY OF PECKING STYLES
Figure 6.13: A complex scene with surface spheres around peaks serving as basis for
topological feature extraction. We show results with 2 and 3 clusters. We observe, that
the different pecking styles are modeled only partly (See Figure 6.3). Please note, that
the circular region colored around a peak has only a quarter of the radius that is used as
query point neighborhood for feature calculation. We reduced the radius for visual clarity.
Overlapping areas of differently colored circles show a mixture of the colors, e.g. yellow in
the case of red and green in the left figure.
100
6.4 Conclusion
different pecking styles. The usage of automatic feature selection during model learning
(e.g. by sparse SVM or Random Forests) would enable to take the complete set of
investigated parameters into account which in turn maximizes the number of pecking
style attributes that can be considered in the analysis.
101
6. SIMILARITY OF PECKING STYLES
102
Description of Petroglyph Shapes
Figure 7.1: Petroglyphs with the corresponding smoothed figures, skeletons and the
derived graphs.
In this chapter we investigate the similarity of petroglyphs with respect to their shape.
In contrast to the automated segmentation of rock surfaces containing petroglyphs,
the automated classification of petroglyph shapes has attracted some research in the
past years [ZWKL09][ZWKL11][DPdL12][DP13]. The large number of single petro-
glyphs makes the usage of automated analysis methods attractive.We propose and eval-
uate a graph-matching approach for shape similarity of petroglyphs (see Figure 7.1).
The motivation to investigate the skeletal graphs of petroglyphs for shape matching is
twofold. First, the material requires robustness against affine transformations and ar-
ticulations. Second, we are not aware of existing petroglyph shape matching approaches
using the skeletal graph.
7.1 Related Work
Numerous surveys about shape analysis have been published, comprehensive and im-
portant in this field are the surveys by Pavlidis [Pav78], Loncaric [Lon98], Zhang and
103
7. DESCRIPTION OF PETROGLYPH SHAPES
Lu [ZL04] as well as by Yang et al. [YKR08]. Classifications and taxonomies of shape
descriptors have been proposed in the mentioned surveys in different variants. A widely
used common denominator is the distinction between contour-based and region-based
descriptors.
Latecki et al. compared several shape descriptors on the MPEG-7 CE-Shape-1
database [LLE00]. The database contains only complete shapes with closed contours.
They propose three main categories for shape descriptors: Contour-based descriptors,
region-based descriptors and skeleton-based descriptors. They investigated robustness
to scaling and rotation, similarity-based retrieval as well as motion and non-rigid de-
formations. The weakest performing descriptor in all cases was the skeleton-based
approach, the most significant drawback is the lack of robustness against scaling and
rotation. The authors assume, that none of the existing approaches to compute skele-
tons is robust enough. But, since the publishing of this paper in the year 2000, promising
skeletonisation algorithms have been proposed and evaluated (e.g. [BLL07]). We sum-
marize region-based, contour-based and skeleton-based approaches that are relevant for
our work and include petroglyph-related work where available.
7.1.1 Region-based Descriptors
Zhu et al. propose the usage of a slight modification of the generalised Hough transform
(GHT) for the mining of large petroglyph datasets [ZWKL11]. The main arguments
for GHT and against other shape similarity measures are the existence of petrogplyph
images where a single petroglyph consists of several parts and the possibility of merged
parts of petroglyphs that drastically change the topology of the petroglyphs. They
extensively evaluate their approach and achieve good results. However, they mostly
evaluate synthetic petroglyph shapes or simple petroglyph shapes drawn by humans
rather than the more exact tracings based on peck marks which we use (see Subsection
7.2.1). Deufemia et al. use the radon transform of petroglyph images as shape descrip-
tor for unsupervised recognition via self-organizing maps (SOM) [DPdL12]. In a second
step, they use a fuzzy visual language parser to solve ambigous interpretations by incor-
porating archaeological knowledge. They evaluate the approach on a large dataset and
achieve good results. However, Deufemia et al. as well as Zhu et al. do not consider
partial or merged petroglyphs, i.e. part-based retrieval that is necessary for petroglyphs
in real-world scenes.
Krish and Snyder propose the shape recognition approach SKS [KS08] which is
based on the generalised Hough transform. They compare the performance of SKS with
Hu moments, curvature scale space (CSS, see Subsection 7.1.2) matching and shape
context (SC, see Subsection 7.1.2). Besides affine transformations, they evaluate partial
shapes. The SKS feature performs good on partial shapes. But, the evaluation data set
consists of 31 different shapes only and does not contain merged shapes.1 Generally,
region-based descriptors have the advantage that they do not need complete contours
for descriptor extraction.
1The data set does not include petroglyphs.
104
7.1 Related Work
7.1.2 Contour-based Descriptors
Mokhtarian et al. propose curvature scale space (CSS) image matching [MM86][Mok95]
[MAK+97]. They smooth a contour by convolution with a Gaussian kernel in different
scales (i.e. different kernel sizes of the Gaussian kernel). Subsequently, they find the
curvature zero crossings on the contour. The descriptor - the CSS image - consists of
the zero crossings in a diagram where the size of the Gaussian kernel is on the y-axis
and the normalized path length of the curve is on the x-axis. This CSS image is used to
match shapes. Mai et al. use the CSS descriptors to acquire contour segments invariant
to affine transformations [MCH10]. They utilize the local maxima in the CSS image
to locate the affine-invariant points and segment the contour at these points. They
match the segments with a dynamic programming approach. They achieve very good
experimental results. They outperform the dynamic programming approach by Petrakis
et al. [PDM02], who are utilizing contour segments as well. Belongie et al. propose
the widely used shape context (SC) descriptor [BMP02]. They sample points on the
contour of an object. For each point, they compute the shape context based on the
spatial distribution of the other points on the shape contour. They match two shapes
by estimating the best transformation from one shape to the other and determining a
dissimilarity based on shape context distance, appearance cost and transformation cost.
Deufemia et al. [DP13] propose a two stage classification of petroglyphs. They use
shape context descriptors to provide an initial raw clustering with self-organizing maps.
In the second step, they use an image deformation model to classify the petroglyph
shapes. They evaluate their approach on a relatively small dataset (17 classes with 3
exemplars) that they enlarge by using 30 affine transformations of each image.
7.1.3 Skeleton-based Descriptors
Siddiqi and Kimia propose a shock grammar for recognition [SK96]. Later, Siddiqi et
al. propose the usage of shock graphs for shape matching [SSDZ99]. Shocks are used
to provide a structural description of 2D shapes. They are contour-based and deliver a
medial axis of the shape, that has additional information for each part of the skeleton.
The representation of the shape is a directed acyclic shock graph, which is used for
shape matching.
Aslan Skeletons are coarse skeletons [AEET08]. They are matched via tree edit dis-
tance and have been evaluated on different data sets with good results [BET09][ET10].
However, while they are insensitive to articulation and affine transformations, they can
not be used for merged shapes, for shapes with holes and for shapes with large missing
parts.
Ling and Jacobs propose the usage of the inner-distance, which is the shortest path
between two landmark points (in this case contour points are used) of a shape [LJ07].
Hence, they implicitly embed skeletal information in the descriptor. The distance be-
tween two contour points is the shortest path within the silhouette of the shape instead
of the Euclidean distance between the two points. They use the idea for three approaches
to shape description. First, they combine the inner-distance with multi-dimensonal scal-
105
7. DESCRIPTION OF PETROGLYPH SHAPES
Figure 7.2: This figure shows a part of a tracing of a rock in Valcamonica (Coren di
Redondo, Rock 1). c Alberto Marretta, used with permission.
ing. Second, they utilize the inner-distance to build a new descriptor based on shape
context, and third they extend the second approach with appearance information of the
shapes along the inner-distance lines. They evaluate the approach on several datasets
with good results. They state, that the proposed descriptors are invariant/insensitive
to articulation and are capable to capture part structures.
Bai and Latecki introduce a skeleton-based approach that matches silhouettes based
on skeleton paths, which are the geodesic paths between skeleton endpoints [BL08]. The
shortest paths are represented by the radii of the maximum inscribed discs at skeleton
points. They use DCE (Discrete Curve Evolution [LL00]) for the skeleton pruning.
The descriptor is on two layers. First, the description of a skeleton endpoint is con-
structed from the shortest paths starting at this point, and second the similarity of
two shapes is computed by matching the descriptors of the skeleton endpoints. They
106
7.1 Related Work
experimentally show, that the method is robust against articulations, stretching and
contour deformations. Bai et al. combine contour features with skeletal features to
improve shape classification [BLT09]. They state, that contour-based approaches can
represent detailed information well, and are up to a certain extent robust against partial
and merged shapes but lack invariance against articulation and non-rigid deformation.
In contrast to that, skeleton-based approaches are robust against non-rigid deforma-
tions. For the contour segments, they follow the ideas of Sun and Super [SS05], but
use DCE to determine the segments. They achieve 96.6% classification rate on the
MPEG-7 CE-Shape-1 [LLE00] database. Xu et al. extend the skeleton path approach
[BL08] by also considering junction points for the skeleton paths descriptor [XWLB10].
They call the junction points and end points of the skeleton graph critical points. They
merge junction nodes based on the paths from the junction nodes to the end nodes.
If the sum of path distances of two nodes normalised by the number of end nodes of
the graph is below a set threshold the two nodes are merged. They achieve slightly
better results than Bai and Latecki [BL08], and state, that the method is efficient even
in the presence of articulation as well as partial and merged shapes. The line of work
summarized in this paragraph is mostly built on the skeleton pruning algorithm based
on DCE proposed by Bai et al.[BLL07].
7.1.4 Discussion of Shape Descriptors
All existing approaches dealing with petroglyphs have drawbacks. [ZWKL11] is eval-
uated at large scale on publicly available data, but the data are different from the
material we are using. They use inter alia sketch data and synthetic petroglyph shapes.
In contrast to that, our data is the product of the standard archaeological documen-
tation process which is manual tracing of individual peck-marks on transparent sheets
(see Section 2.6). [DPdL12] and [DP13] evaluate on small datasets. [DP14] have a
large dataset and compare three different descriptors in their evaluation, but they don’t
evaluate the combination of descriptors. Moreover, they only compare region-based
descriptors. In our approach, we aim to overcome these shortcomings.
7.1.5 Graph Matching
To our knowledge, there is no work on petroglyphs that utilizes skeletons or skeletal
graphs. We aim at investigating the skeletal graph for petroglyph similarity modeling.
In our approach, we use the the promising algorithm by Bai et al. [BLL07] for skele-
tonisation and skeleton pruning. To investigate the distinctiveness of skeletal graphs,
we model petroglyph similarity as graph similarity. In the following, we summarize
works that use graph matching to model shape similarity and popular graph matching
approaches. There are numerous approaches in shape matching that utilize graphs.
Comparable to our approach are methods, that model shape similarity as similarity
of the skeletal graph. Klein et al. use tree edit distance to match shapes described
by their shock graphs [KSK01]. Di Ruberto uses attributed skeletal graphs to model
shape similarity [DR04]. Aslan skeletal graphs are matched with tree edit distance
107
7. DESCRIPTION OF PETROGLYPH SHAPES
[BET09][ET10]. For our material (see Section 7.2.1), tree matching is not sufficient, as
the skeletal graphs may contain cycles. Hence, we have to use graph matching. There is
a long line of work in graph matching. A recent volume of workshop proceedings edited
by Kropatsch et al. includes papers on many aspects of graph-based representations in
pattern recognition [KAH+13].
A popular and intuitive similarity measure for a pair of graphs A and B is the graph
edit distance (GED). GED defines similarity of two graphs as the minimum number of
edit operations (remove node, add node, remove edge, add edge) that are needed to
transform graph A to graph B. The computation of the graph edit distance is NP-hard
[ZTW+09]. This problem is addressed in several ways. There are approximations for
the GED, e.g. the widely used Hungarian algorithm, or A* beam search. Another
way is to use graph spectra (e.g. [DvLV08]) or to embed node and edge attributes of
graphs in vector spaces and subsequently use standard similarity measures and machine
learning methods to match similarity (e.g. [GVB12] or [LSYZ11]).
7.2 Proposed Approach
7.2.1 Benchmark Dataset
In this chapter we use a fraction of our dataset described in Subsection 2.6.2 consisting
of one hundred tracings of individual petroglyphs in ten classes. Figure 7.3 shows
examples of our material. Note the high intra-class variability not only in terms of
affine transformations and articulations but also in general differences of the shapes in
a class. Some classes are perceptually very close, e.g. classes two and three.
7.2.2 Overview of our Approach
The literature review suggests, that a combination of contour-based and skeletal descrip-
tors should yield optimum results. The existing petroglyph shape retrieval systems uti-
lize region-based features [ZWKL11][DPdL12] as well as contour-based features [DP13].
These methods are the baseline reference methods we compare our approach with. We
concentrate on the investigation of skeletal features for the petroglyph shape recogni-
tion problem, because the petroglyph shapes are intuitively already skeletal shapes. We
propose to model similarity of petroglyphs as a skeletal graph matching problem. We
derive the graphs from the skeletons of the petroglyph shapes. We use each end point
and each junction point of a given petroglyph skeleton as nodes for our graph, and the
skeleton branches connecting these points as edges. See Figure 7.4 for exemplary skele-
tons and derived graphs. We utilize the skeletal graph as descriptor that is invariant
to affine transformations as well as articulations. Hence, we discard all spatial infor-
mation, as we are only interested in topological information. We match the resulting
undirected graphs following two strategies. First, we utilize the graph edit distance
(GED) as pairwise similarity measure, and second, we use graph embedding (GE) to
create a feature vector for each graph and match these feature vectors. We embed the
108
7.2 Proposed Approach
Figure 7.3: This figure shows the petroglyph dataset that we investigate in this chapter.
Each column contains examples of one class. We observe, that some of the classes have
high intra-class variance of shape. The petroglyphs are from various rocks in Valcamonica.
c CCSP - Centro Camuno di Studi Preistorici and Alberto Marretta, used with permission.
109
7. DESCRIPTION OF PETROGLYPH SHAPES
Figure 7.4: This figure shows petroglyph tracings (column 1), pre-processed shapes (col-
umn 2), extracted skeletons (column 3) and the derived graphs with a pruning threshold
of 2, 10 and 30px (columns 4-6). We observe, that the employed skeletonisation algo-
rithm fails to extract details in some cases (heads in rows 1 and 4), while in other cases
small details are covered well. Furthermore, we observe, that a high pruning threshold
removes skeletal noise (rows 1, 2, 4, 6, 9 and 10), but may also discard relevant topological
information (row 5).
110
7.2 Proposed Approach
graphs in feature vectors of a normed length by extracting several graph properties (see
Table 7.1), and calculate pairwise distances with Euclidean distance. We classify using
a k-NN classifier with an extension for intermediate descriptor fusion.
7.2.3 Descriptors
We compare the distinctiveness of contour-based as well as region-based shape descrip-
tors with the distinctiveness of undirected skeletal graphs for the petroglyph classifica-
tion problem. As baseline methods, we utilize a) Shape Context (SC) [BMP02], which
has been used by Deufemia et al. for petroglyph classification [DP13], b) Inner Distance
Shape Context (IDSC) [LJ07], as well as c) General Hough Transform (GHT) proposed
by Zhu et al. for petroglyph classification [ZWKL11].1 Our proposed petrogylph de-
scriptor makes use of the skeletal graphs in two ways. First, we use the graphs directly
as descriptors and measure the similarity with GED. We employ the A* algorithm with
beam search and the Hungarian algorithm. Both variants tolerate cycles in the graphs.2
Second, we utilize topology features of the undirected graphs for GE.3 Please refer to
Table 7.1 for the list of selected features for petroglyph description. We evaluate the
distinctiveness of single topology features and of combinations of topology features. In
order to be able to use skeletonisation and contour-based descriptors, we need to pre-
process our material to achieve continuous boundaries. We use standard filters and
morphological operations for this purpose. The resulting shape images are the input
for SC, IDSC as well as for the skeletonisation for which we use the method by Bai et
al. [BLL07].4 We normalize the width of the input petroglyph to 500px and prune the
skeletons by joining nodes which have a spatial distance smaller than a threshold. The
threshold varies between 2 and 30px. See Figure 7.4 for accordingly pruned graphs..
GHT is computed on the original petroglyph images.5
7.2.4 Descriptor Fusion and Classification
Additionally to the performance of individual descriptors, we evaluate the performance
of descriptor combinations. For SC, IDSC, GHT and GED we compute pairwise dis-
tances. For the feature vectors we obtain by GE, we calculate pairwise Euclidean
distances. The combination of descriptors via the combination (i.e. unweighted sum-
mation) of distance matrices would require a normalization in all utilized feature spaces,
which could only be provided by heuristically determined thresholds for the maximum
dissimilarity that can occur for one specific descriptor. We want to avoid this step, and
aim at preserving complementary distinctiveness as far as possible in the classification
1For all three descriptors, implementations have been kindly provided by the respective authors.
2We use the implementation in the Graph Matching Framework kindly provided by Kaspar Riesen
[REB13].
3We utilize the MIT strategic engineering tools for network analysis kindly provided by Bounova
and de Weck [BdW12]: http://strategic.mit.edu/downloads.php?page=matlab_networks.
4We thank Bai et al. for providing the implementation.
5We use resolutions of 10x10px, 20x20px and 30x30px. We report on the best performing resolution
10x10px.
111
7. DESCRIPTION OF PETROGLYPH SHAPES
Table 7.1: Topology based features extracted from the skeletal graphs. Each feature is a
description of the whole graph. Please refer to [BdW12] for more details.
ID Feature Description
1 numNodes Number of nodes
2 numEdges Number of edges
3 numCycles Number of independent cycles, also known as the cyclomatic num-
ber of a graph
4 linkDensity Link density, i.e. ratio of existing links to maximum possible links
5 avgDegree Average number of links over all nodes
6 numLeaves Number of leaves, i.e. number of nodes with only one link
7 - 11 histDegrees 5 bin histogram of node degrees, i.e. counts of all nodes with
1,2...5 links. The maximum occurring degree in our dataset is 5
12 sMetric Sum of degree products across all edges, i.e. for each edge, multi-
ply the degrees of the two nodes connected by the edge and finally
sum up the products
13 graphEnergy Sum of the absolute values of real components of the eigenvalues
14 avgNeighDegree Average of the average neighboring degrees of all nodes
15 avgCloseness Average of closeness over all nodes
16 pearson Pearson coefficient for the degree sequence of all edges of the graph
17 richClub Rich club metric for all nodes with a degree larger than 1
18 algebConnect Algebraic connectivity, i.e. the second smallest eigenvalue of the
Laplacian
19 diameter The longest shortest path between any two nodes in the graph
20 avgPathlength Average shortest path
21 graphRadius Minimum vertex eccentricity
112
7.3 Experimental Results
Table 7.2: Classification accuracy of GED in percent. We use LOOCV-validated 5-NN
classification. The maximum number of open paths for the A* beam approximation is
6000.
Pruning threshold
2 10 15 20 25 30
Hungarian 52 57 57 30 39 37
A* beam 51 57 49 47 47 47
process. Instead of merging the similarity matrices, we employ single k-NN classifiers
(k=5) for each individual feature. A straightforward approach would be to combine
the classification results of all classifiers by majority voting. To obtain a richer and
more expressive basis for making a decision, we refrain from this simple combination of
classification results. Instead, we fuse the classifiers at an intermediate step. We use
the class labels of the five nearest neighbors of each descriptor and concatenate these
to a set of nearest neighbors which contains 5n class labels, with n being the number
of descriptors combined. Subsequently, we classify according to the conventional rule
of k-NN with a majority vote. This incorporates more information than a voting based
on the classification results, as we implicitly include the probability of the classifica-
tion result in form of the number of class members among the nearest neighbors. We
validate our results of single descriptors as well as fused descriptors by leave-one-out-
cross-validation (LOOCV). We employ accuracy as quality measure, i.e. the ratio of the
number of correctly classified petroglyphs to the total number of classified petroglyphs.
7.3 Experimental Results
7.3.1 Graph Matching
Table 7.2 shows the results of GED employed on the unweighted undirected skeletal
graphs. The maximum accuracy is 57%. Both employed methods achieve this result
at a pruning threshold of 10px. We observe, that the results of both methods tend
to decrease with an increasing pruning threshold larger than 10px. We assume, that
the distinctiveness of the skeletal graphs first increases, as skeletal noise is pruned, and
then decreases after a maximum of 10px as more and more topological distinctiveness
is removed in the pruning process (see Figure 7.4). Furthermore, we observe, that for
higher pruning thresholds, A* beam search outperforms the Hungarian algorithm.
Table 7.3 summarizes the results of the evaluation of single topological features of
GE. We observe, that feature 13 (graph energy) yields the best result with 47% accuracy
given a pruning threshold of 10px. We achieve also the second and the third best results
with a pruning threshold of 10px. This confirms the GED results (see Table 7.2), where
a pruning threshold of 10px delivers the best results as well. Furthermore, we observe,
113
7. DESCRIPTION OF PETROGLYPH SHAPES
Table 7.3: Classification accuracy (in percent) of GE utilizing single scalar topological
features. We use LOOCV-validated 5-NN classification. p denotes the size of the prun-
ing threshold. The three best values are emphasized. Please refer to Table 7.1 for the
descriptions corresponding to the feature IDs.
Feature ID
p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Avg.
2 25 21 14 35 27 40 40 10 33 10 10 22 25 21 30 23 35 30 27 29 14 25
10 37 44 14 29 31 40 40 9 25 7 10 33 47 23 43 23 29 32 25 31 20 28
15 35 25 17 38 33 37 37 5 21 7 10 30 37 16 30 22 38 30 23 39 23 26
20 31 28 18 31 37 21 21 8 16 10 10 35 35 15 33 26 31 33 29 37 21 25
25 33 32 20 37 39 25 25 12 14 10 10 25 39 13 37 17 37 37 26 33 29 26
30 27 36 13 28 35 23 23 13 8 10 10 33 27 9 36 22 28 34 29 26 28 24
Avg. 31 31 16 33 34 31 31 10 20 9 10 30 35 16 35 22 33 33 27 33 23
that features 8,10 and 11 (bins 2,4 and 5 of the degree histogram) perform around
random (10%). We assume that this is due to the fact, that most graph nodes that
are not leaves seem to be 3-connected (see Figure 7.4). Hence, the counts of 2, 4 and
5-connected nodes have weak discriminative capabilities.
Table 7.4 reports the GE performance of the best feature combination of each di-
mension from 1 to 10. We employ brute-force feature selection for each number of
feature dimensions, i.e. we evaluate all possible combinations for 1 to 10 out of the 21
single features (see Table 7.1). We observe, that feature combination strongly improves
results. The maximum accuracy is 57%, as it is the case with GED as well.
7.3.2 Comparison with Baseline Descriptors and Descriptor Fusion
Table 7.5 contains the results of shape descriptors with which we compare our skeletal
graph approach as well as combinations thereof. We observe, that SC and IDSC perform
better than GE when employed as single descriptors. GE clearly outperforms the ded-
icated petroglyph descriptor GHT of [ZWKL11]. Descriptor fusion generally improves
results. The fusion of two or three shape descriptors improves the results slightly. The
fusion of all four descriptors improves the results from 82% for the best single descriptor
to 88%. This demonstrates, that the descriptors contain complementary information
that is well preserved by using the proposed k-NN fusion method.
To discuss the limitations of our approach and the weak performance of GHT on our
dataset compared to the datasets employed by Zhu et al. [ZWKL11], we present example
query petroglyphs and their nearest neighbors. Figure 7.5 shows a query petroglyph,
that is misclassified by GED as well as by GHT. We observe, that the skeletal graph
of the query antrophomorph figure has a topology which is similar to the topology of a
cross. Hence, the five nearest neighbors are crosses. The nearest neighbors computed by
GHT also fail to determine the correct class for the query petroglyph. We observe, that
114
7.3 Experimental Results
Table 7.4: Classification accuracy (in percent) of GE using combinations of 1 to 10 single
topology features that perform best for each dimension. We use LOOCV-validated 5-NN
classification with brute-force feature selection. The pruning threshold is 10px. The best
values are emphasized. Please refer to Table 7.1 for the descriptions corresponding to the
feature IDs.
Feature IDs Accuracy
13 47
2, 15 54
4, 6, 18 56
6, 7, 18, 20 57
6, 7, 15, 18, 20 57
4, 6, 7, 15, 18, 20 56
1, 2, 4, 6, 7, 15, 17 55
1, 2, 4, 6, 7, 13, 15, 17 55
1, 2, 4, 6, 7, 14, 15, 17, 20 53
1, 2, 4, 6, 7, 13, 14, 15, 17, 20 53
Table 7.5: Classification accuracy (in percent) of GE, SC, IDSC and GHT and combina-
tions thereof. We use LOOCV-validated 5-NN classification for single shape features and
the classifier fusion method described in Section 7.2.4.
Descriptor Single Fused
GE x x x x x x x x
IDSC x x x x x x x x
SC x x x x x x x x
GHT x x x x x x x x
Accuracy 57 81 82 39 80 78 54 86 84 83 84 81 86 81 88
115
7. DESCRIPTION OF PETROGLYPH SHAPES
Figure 7.5: This figure shows a sample which is misclassified with GED and GHT. The
query image is on the left, and the five nearest neighbors on the right. The first two rows
show the result for GED and the utilized graphs. The third row shows the result for GHT.
the spatial distribution of the pixels in the query image is comparable to the nearest
neighbors. Figure 7.6 shows a query petroglyph, that is correctly classified by GE and
misclassified by GHT. We observe, that in the case of GE, the four nearest neighbors
are topologically very close to the query graph. The fifth neighbor is different. But,
we have to take into account, that GE matches with a set of features, that cannot
necessarily be understood intuitively (see Table 7.4). The GHT nearest neighbors show
comparable pixel distributions. We assume, that the less competitive performance of
GHT on our dataset is related to the fact, that the test datasets used by Zhu et al. are
manual transcriptions of petroglyph skeletons (or sometimes outlines, see [ZWKL11]
p95) which leads to simpler shapes than the detailed tracings of peck marks in our
material.
7.4 Conclusion
We present a novel petroglyph descriptor based on the skeletal graph topology and
propose matching with graph edit distance (GED) and graph embedding (GE). For
GE, we propose 21 different scalar topological features. We evaluate the descriptor
and the matching on a petroglyph dataset containing 10 classes with 10 exemplars and
compare the performance with other shape descriptors used in petroglyph classification.
Matching of the skeletal graphs with GE and with GED delivers comparable re-
sults. Both matching methods achieve 57% accuracy. GED is of high computational
complexity, whereas GE has low computational demand due to low feature vector di-
mensionality. The two best performing combinations of topology features have only
4 and 5 feature dimensions. The contour-based features SC and IDSC outperform the
region-based GHT and the skeletal graph-based GE and GED. GE and GED outperform
the region-based GHT. The proposed descriptor fusion clearly improves results. In 5 of
116
7.4 Conclusion
Figure 7.6: This figure shows a sample which is correctly classified with GE and mis-
classified with GHT. The query image is on the left, and the five nearest neighbors on the
right. The first two rows show the result for GE and the utilized graphs. The third row
shows the result for GHT.
7 descriptor combinations the usage of our descriptor improves results (see Table 7.5).
The combination of our graph-based petroglyph descriptor with other descriptors yields
a classification performance of 88% which is not achieved without the proposed skeletal
descriptor. This shows that the skeletal features represent information not captured by
the contour-based features and the region-based features. We conclude that descriptors
derived from skeletons are valuable for petroglyph classification.
Future work could include improvement of our petroglyph descriptor based on the
skeletal graph in two ways. First, we aim at improving pre-processing of the shapes as
well as skeletonisation. Second, we could investigate the suitability of spatial features.
In this chapter, we investigated topological information of the skeletal graph, which is
a highly invariant abstraction of the skeleton. In future, we could investigate, whether
spatial relations of graph parts contain valuable information for our task, because our
material has already undergone one step of abstraction by the artists, who created the
petroglyphs.
117
7. DESCRIPTION OF PETROGLYPH SHAPES
118
Skeletonisation of Petroglyph Shapes
Figure 8.1: A human figure and a house, the pre-processed shapes and skeletonisation
results.
The shape description approach described in the previous chapter relies on ro-
bust skeletonisation of petroglyph shapes (see Figure 8.1). Skeletonisation and
boundary-based descriptors rely on closed boundaries of a shape. In this chapter, we
investigate both: a) the preprocessing of petroglyph shapes to close boundaries and b)
the subsequent skeletonisation.
The petroglyph tracings we use (see Subsection 2.6.2) pose novel challenges to skele-
tonisation due to their complex topology and structure. They may be incomplete due
to partial abrasion of the rock surface. Since the petroglyphs are made of individual
peck marks they exhibit a complex boundary as well as numerous holes in their interior
(see Figure 8.2(a)). Additionally, complex figures may consist of several disconnected
parts. Petroglyph shapes can either show filled bodies or just the silhouette of a figure
depending on their artistic style. Finally over the years figures have been pecked on top
of each other which results in merged shapes.
Existing skeletonisation algorithms are not directly applicable to this type of ma-
terial and yield poor skeletons as shown in Figure 8.2(b). One reason for the poor
performance is that existing methods are usually developed on perfectly segmented
119
8. SKELETONISATION OF PETROGLYPH SHAPES
(a) Unprocessed shape from
digitized petroglyph tracing
(b) Skeleton of unprocessed
petroglyph shape
(c) Skeleton of pre-processed
petroglyph shape
Figure 8.2: Comparison of the original petroglyph tracing (a) and its skeleton (b). The
petroglyph shape pre-processed with the proposed method results in a significantly simpli-
fied skeleton that still preserves the important parts of the original shape (c).
120
8.1 Related work
shapes with continuous contour and continuously filled regions originating, for exam-
ple, from public datasets such as MPEG-7 Core Experiment CE-Shape-1 and Kimia-
99 [BLL07, BL07, YBYZ09, SBYL13]. Thus most methods do not fulfill the require-
ments of noisy real-world data such as that employed in this work. Aside from different
applications (e.g. shape retrieval) enabled by robust skeletonisation, the employed pet-
roglyph shapes pose a challenging testbed for the further development of skeletonisation
algorithms.
Our contribution is a comprehensive investigation of skeletonisation in a new ap-
plication domain that provides noisy real-world data and has rarely been explored so
far [TTMI06]. We study the applicability of existing skeletonisation methods and eval-
uate their strengths and weaknesses. We identify two major requirements for robust
skeletonisation: (i) pre-processing of the shapes is necessary to smooth the contour and
fill holes in the interior; (ii) existing methods require adaptions to handle the complex
structure of the shapes. We propose several improvements of recent skeletonisation
methods to make them applicable to the investigated material. Finally, we come up
with a novel shape pre-processing method that improves the shapes by smoothing and
hole filling. The goal of the pre-processing is to reduce the complexity of the shapes and
thereby to facilitate the subsequent skeletonisation. Our method balances the degree of
smoothing and hole filling by an adaptive threshold to yield a suitable tradeoff between
preserving the topology and removing noise. An evaluation on our dataset shows that
pre-processing is a crucial step for robust skeletonisation of noisy and complex shapes
like petroglyph tracings (see Figure 8.2). Furthermore the proposed improvements of
the skeletonisation algorithms yield more accurate and complete skeletons.
The chapter is structured as follows: In Section 8.1 we review related work on
skeletonisation and skeleton pruning and identify suitable algorithms for our task. Sub-
section 8.2.1 presents our real-world material and its characteristics. We describe our
pre-processing approach and the improvements of skeletonisation algorithms in Sec-
tion 8.2. Experimental setup and results are presented in Sections 8.3 and 8.4. Finally
we draw conclusions in Section 8.5.
8.1 Related work
In this section we review skeletonisation and skeleton pruning algorithms, analyze their
properties and identify suitable methods for our task. In the literature the usage of
terminology for skeletonisation is highly ambiguous. Skeletonisation, thinning, me-
dial axis transform, distance transform as methodologies and skeleton, medial axis
or medial line as their results are used inconsistently [Baj06]. According to Arcelli
and Baja [AB96] algorithms for skeleton computation in discrete space can generally
be partitioned into two categories: Methods that perform skeletonisation by medial
axis transform produce skeletons following Blum’s definition of the medial axis [Blu67]
and techniques employing skeletonisation by thinning derive a thin version of a shape
[Din55, KCRU58]. A third category of approaches performs the medial axis transform
to polygonal shapes in continuous space [Mon69, Kir79, OI92]. Additionally, there is a
121
8. SKELETONISATION OF PETROGLYPH SHAPES
group of more recent skeletonisation algorithms that utilize physics-based modeling of
the shapes [Par11, KC09] for which we suggest a fourth category.
All skeletonisation methods are sensitive to boundary noise, i.e. small perturbations
of a shape may have large influence on the skeleton (see Figure 8.2). To overcome this
problem some form of regularization is required [SB98]. This regularization process is
generally referred to as “skeleton pruning”. Shaked and Bruckstein observe that pruning
is an essential part of skeletonisation algorithms and most recent developments combine
skeletonisation and skeleton pruning in one algorithm. Skeleton pruning methods can be
consolidated in two major categories: The first covers the pruning of skeleton branches
based on a significance value calculated for every single skeleton point. This results in
a shortening of all skeleton branches. The second class of skeleton pruning algorithms
calculate a significance measure for each branch. Based on its significance value a branch
is either removed completely or remains in the skeleton [LWZH13].
8.1.1 Point-based pruning approaches
Montanari first develops a form of regularization to detect the most important skeleton
branches [Mon68]. He proposes the use of a threshold for Blum’s “propagation veloc-
ity of the wavefront”. Blum and Nagel extend this idea and propose a boundary/axis
weight for the regularization of unwanted branches caused by boundary perturbations
[BN78]. They state, however, that boundary perturbations are not always unwanted
distortions but might actually be important features of a shape and therefore pruning
should be carried out with great care. Ho and Dyer propose the computation of the
relative prominence of a skeleton point by using geometric relations between the maxi-
mum generating disk at the point and the contour of the shape [HD84]. Ogniewicz and
Ilg compare several other regularization methods for skeleton points and propose the
generation of a skeleton pyramid for further pruning [OI92]. Telea and van Wijk intro-
duce a skeletonisation algorithm based on a fast marching level set method (Augmented
Fast Marching Method, AFMM) [TvW02]. For every skeletal point they determine the
length of the boundary segment it originates from and prune skeleton points using a
single threshold. Howe applies the work of Telea and van Wijk to handwriting recog-
nition using the contour length as salience measure [How04]. Shen et al. compare this
and other pixel-based significance measures and introduce a new significance measure
for skeleton pruning by calculating the bending potential ratio (BPR) of the contour
segment generated by the two points of the maximum inscribed disc that are tangent to
the boundary [SBH+11]. Telea further improves AFMM by a different saliency metric,
and proposes skeletonisation for feature preserving shape smoothing [Tel12].
8.1.2 Branch-based pruning approaches
The methods summarized in Subsection 8.1.1 all compute a significance value for each
single point of the skeleton. A thresholding of this value leads to a shortening of the
branches. Branch-based methods, in contrast, avoid the shortening of branches and
instead use a significance value to remove or retain entire branches. Bai et al. [BLL07]
122
8.1 Related work
propose a novel method for skeleton pruning based on Discrete Curve Evolution (DCE)
introduced by Latecki and Lakamper [LL99]. They determine the contour points of a
shape that have maximum curvature and delete all skeleton branches that do not end
at one of these points. This approach inspired numerous other state-of-the-art skeleton
pruning algorithms. Bai and Latecki further improve DCE by removing the necessity
of prior knowledge about the shape [BL07]. They compute the DCE-skeleton with a
fixed parameter (50 vertices) and subsequently add a reconstruction step, which removes
skeleton branches with low contribution to the original shape. Yang et al. use the same
methodology and extend the reconstruction algorithm to increase speed and to enable
the computation of skeletons from shapes with holes [YBYZ09]. Shen et al. introduce
a normalization factor in the reconstruction step that quantifies the tradeoff between
the simplicity of their skeletons and the reconstruction error of the shapes [SBYL13].
Liu et al. extract the Generalized Voronoi Skeleton of a shape and then apply DCE
to perform a first pruning of the obtained skeleton [LWH+12]. Subsequently, they fur-
ther prune by balancing the visual contribution and the reconstruction contribution of
each skeleton branch. Liu et al. further devise a skeleton pruning approach that fuses
the information of several different branch significance measures [LWZH13]. Recently,
Krinidis and Krinidis proposed a new skeletonisation approach that smoothes the polyg-
onal approximation of a shape iteratively [KK13]. In each iteration they determine the
most important polygon vertices from the angles of their incoming edges and prune the
skeletons by deleting those branches that connect less important nodes.
8.1.3 Comparison of algorithms
In the following, we specify a number of criteria for the comparison of the skeletonisation
and pruning techniques presented above.
• Robustness against remaining insignificant branches: Insignificant branches are
branches that do not contribute essentially to the original shape and thus should
be avoided or pruned.
• Robustness against deletion of significant branches: A significant branch has an
essential contribution to the figure’s shape and thus should remain in the skeleton.
A deletion would significantly change the structure of the skeleton.
• Robustness against branch shortening : Branch shortening occurs when insignifi-
cant as well as significant skeleton branches are shortened likewise. This bears
the risk of changing the structure of the skeleton.
• Rotation and scale invariance: The skeleton of a differently scaled and rotated
shape should be equivalent.
• Number of parameters : A large number of parameters increases the dependence
of an algorithm on user input but at the same time gives more control. We
prefer algorithms with a low number of parameters with adequate sensitivity to
parameter changes.
123
8. SKELETONISATION OF PETROGLYPH SHAPES
Figure 8.3: Some shapes from example classes (e.g. antropomorph, bird, cross, deer, etc.)
from our entire dataset of more than 1100 shapes classified into two different archeological
typologies.
• No prior knowledge about shape needed : Parameters such as the number of end-
points, or absolute values that depend on the size and complexity of the shape,
require a priori knowledge and should be avoided.
Table 8.1 compares the methods presented above with respect to the specified crite-
ria. The comparison shows that all of the reviewed algorithms differ from each other on
at least one criterion. Some of them show major deficiencies such as the deletion of sig-
nificant skeleton branches, rotation and scale variance and the need for prior knowledge
about the shapes. For our experiments on petroglyph skeletonisation we select a subset
of the presented algorithms. We perform the selection of methods based on the following
considerations: (i) both branch-based and point-based pruning methods should be se-
lected to increase the heterogenity of evaluated approaches; (ii) all algorithms should be
robust to scale and rotation, as the investigated petroglyphs may be rotated and scaled
arbitrarily; (iii) the number of input parameters should be low. Based on these consider-
ations, we select the BPR-algorithm of Shen et al. [SBH+11], the DCE -algorithm of Bai
et al. [BLL07], and the SPT -algorithm of Shen et al. [SBYL13]. Furthermore, we add
simple morphological thinning as an additional method to investigate a skeletonisation
method that does not apply any pruning.
8.2 Approach
The characteristics of petroglyph shapes impede skeletonisation which leads to unsatis-
factory results (see for example Figure 8.2). For robust skeletonisation a pre-processing
124
8.2 Approach
Table 8.1: Comparison of recent skeletonisation methods with respect to the identified
criteria. Note that for point-based pruning approaches the first two criteria do not apply
because they do not distinguish between significant and insignificant branches.
R
ob
us
tn
es
s
ag
ai
ns
t
re
m
ai
ni
ng
in
si
gn
ifi
ca
nt
br
an
ch
es
R
ob
us
tn
es
s
ag
ai
ns
t
de
le
ti
on
of
si
gn
ifi
ca
nt
br
an
ch
es
R
ob
us
tn
es
s
ag
ai
ns
t
br
an
ch
sh
or
te
ni
ng
R
ot
at
io
n
an
d
sc
al
e
in
va
ri
an
ce
N
um
be
r
of
pa
ra
m
et
er
s
N
o
pr
io
r
kn
ow
le
dg
e
ab
ou
t
sh
ap
e
ne
ed
ed
Point-based pruning
Chord residual + Skeleton pyramid [OI92] - - no yes 3 no
AFMM [TvW02] - - no no 1 no
Boundary length [How04] - - no yes 1 no
Bending potential ratio [SBH+11] - - no yes 1 yes
Saliency metric [Tel12] - - no yes 1 yes
Branch-based pruning
DCE-skeleton [BLL07] no no yes yes 1 no
Discrete skeleton evolution [BL07] yes yes yes yes 1 yes
Quick stable skeletons [YBYZ09] yes yes yes yes 1 yes
Tradeoff reconstruction error / skeleton
simplicity [SBYL13]
yes yes yes yes 2 yes
Visual contribution / reconstruction con-
tribution [LWH+12]
yes yes yes yes 3 no
Information fusion [LWZH13] yes yes yes yes 1-2 yes/no
Empiric mode decomposition [KK13] yes yes yes yes 1 yes
125
8. SKELETONISATION OF PETROGLYPH SHAPES
of the shapes is necessary as well as improvements of skeletonisation techniques. Sub-
sequent to the description of our material, we present a fully automated shape pre-
processing method and propose a number of improvements for the selected skeletonisa-
tion algorithms to make them applicable to petroglyph shapes.
8.2.1 Investigated material
For the skeletonisation experiments in this chapter, we use the full dataset as described
in Subsection 2.6.2. Figure 8.3 shows a small subset of the dataset. Earlier experi-
ments of Takaki et al. on Central Asian petroglyphs showed that skeletonisation is a
useful abstraction of shapes [TTMI06]. Thus it enables higher level applications such as
similarity search and automated shape classification, which is our ultimate goal. Pet-
roglyphs pose a challenge for skeletonisation as they are made of single peck marks
and thus have neither a continuous contour nor continuously filled regions. Figure 8.3
shows that the shapes have highly varying complexity, contain numerous holes due to
incompletely pecked areas, often contain very fine structures (horns of deer, feathers of
birds, etc.) and have disconnected parts.
8.2.2 Adaptive shape pre-processing
The major challenge of pre-processing is to find a solution that forms a good tradeoff
across all different types of petroglyphs in our dataset. We define a minimum set
of requirements that a suitable pre-processing needs to fulfill to enable more robust
skeletonisation: (i) close small holes, (ii) smooth the contour, (iii) connect nearby parts,
and (iv) avoid the unwanted decomposition of the shapes. We design a method that
automatically improves the shapes according to these requirements and that terminates
autonomously when a good tradeoff amongst the four aims is achieved. The intuition
behind our approach is to combine different structural and morphological filters in a
way that they join their strengths. An overview of the method is shown in Figure 8.4.
Initially we resize and pad all shapes to normalize the inputs. Next we apply a
median filter with size s
med
to the input. Median filtering removes small holes in fore-
ground and background (salt and pepper noise) and at the same time slightly smooths
the contour. Apart from this, however, the median filter may generate artifacts by dis-
connecting weakly connected blobs. To compensate for these artifacts, we apply an area
opening and closing originally proposed by Vincent for grayscale images [Vin94]1 as well
as a dilation operation. We use an area size of t
aoc
pixels as threshold for area opening
and closing, and combine it with a dilation by a disc with a radius r
dil
to reconnect
disjoint parts. We iterate these steps with increasing median filter size until a stopping
criterion is met.
The stopping criterion requires a robust indicator function that is suitable for the
differently complex shapes in the dataset. We evaluate different indicator functions
such as solidity and circularity of the shape, the number and size of foreground and
1Note that the proposed area opening/closing is fundamentally different from a morphological
opening/closing as it does not employ a structuring element.
126
8.2 Approach
Figure 8.4: The workflow of the automated shape pre-processing method. We input a
raw binary shape and apply a median filter and binary morphology.
127
8. SKELETONISATION OF PETROGLYPH SHAPES
background blobs, and the number of endpoints in the thinning skeleton. Our prelimi-
nary experiments show that the most robust criterion is a combination of the number
of foreground blobs and the number of background blobs. If the number of both do
not change over a certain number of iterations n
it
, we terminate the pre-processing.
The assumption behind this stopping criterion is that if the number of foreground and
background blobs remains constant for some time, the figure is likely to be in a robust
state where the influence of noise is low.
After the iterative pre-processing has terminated, we smooth the contour by a con-
volution filter with a Hann window of size s
conv
. We smooth the x- and y-coordinates
of the shapes’ contour points separately which removes contour perturbations and thus
reduces the likelihood to get spurious (insignificant) skeleton branches in subsequent
skeletonisation.
Figure 8.5 illustrates multiple iterations of the proposed pre-processing method for a
given shape. The numbers of foreground and background blobs decrease rapidly during
the initial iterations as holes in the shape are removed and closely spaced shape parts are
merged. After four iterations the shape reaches a robust state which is indicated by the
fact that the numbers of foreground and background blobs do not change for a certain
number of iterations, i.e. the two indicator functions reach a plateau. We observe
further plateaus (e.g. starting at iteration 7, 12, and 15). As iterations proceed the
plateaus usually get longer. The stopping criterion finally decides after which plateau
length to stop by parameter n
it
. For the given example n
it
= 3 yields satisfactory
results. After the first plateau the number of foreground blobs increases again and the
shape starts to decompose.
8.2.3 Improvements of existing skeletonisation methods
We take the automatically pre-processed shapes as basis for the computation of the
skeletons with the chosen algorithms (thinning, BPR, DCE, and SPT). In our prelimi-
nary experiments we observe that two of these algorithms (DCE and SPT) suffer from
severe problems (see Figure 8.6(a), left column): The original DCE algorithm [BLL07]
is not able to compute the skeleton of multiple blobs in one image, it does not preserve
the topology of the shape (loses inner connections) and sometimes generates incomplete
skeletons. We propose improvements to compensate for these shortcomings. First, we
apply skeleton computation in all blobs of the input instead of on the first blob only
(see Figure 8.6(a), 1st row). To preserve the inner connections, we extend the contour
tracing of DCE in a way that all inner contours are traced (see Figure 8.6(a), 2nd row).
The original DCE-algorithm employs either only the concave or only the convex contour
points (depending on which set of points is larger) for skeletonisation and misses signifi-
cant shape parts when the concave are selected (see Figure 8.6(a), 3rd row). We remove
the selection criterion and employ all (convex and concave) points for skeletonisation.
As a result we obtain complete skeletons even for complex shapes. The effects of our
improvements are shown in Figure 8.6(a), right column. The SPT algorithm [SBYL13]
utilizes DCE for a coarse pruning of the shape and thus benefits from our improvements
as well. A second pruning step in SPT refines the skeleton by deleting branches whose
128
8.2 Approach
Figure 8.5: Sequence of the automated pre-processing of a shape with the proposed
method. The numbers of foreground and background blobs decrease rapidly. At iteration
2, a stable minimum of number of blobs is reached and the stopping criterion aborts the
loop at iteration 3. We observe that further iterations decompose the shape starting at
iteration 6.
129
8. SKELETONISATION OF PETROGLYPH SHAPES
(a) Original (left column) and improved DCE-algorithm (right
column)
(b) Original (left) and improved SPT-algorithm (right)
Figure 8.6: Comparison of skeletons of the original DCE and SPT algorithms (left col-
umn) with our improved versions (right column). Our improvements eliminate all short-
comings.
reconstruction contribution to the shape is below a certain threshold. In the original
reconstruction step the skeleton path tracing starts at one seed point and traces the
skeleton to every endpoint. Thereby inner connections and partly outer connections are
missed and thus removed from the skeleton (see Figure 8.6(b), left). We improve the
skeleton path tracing by setting several seed points at the skeleton branch points. As a
result all inner connections are preserved in the final skeleton (see Figure 8.6(b), right).
130
8.3 Experimental setup
8.3 Experimental setup
From the annotated material described in Subsection 2.6.2, we derive a dataset of 1138
petroglyph shapes to carry out our experiments with. In the following, we describe
the selection of the parameters for our adaptive shape pre-processing method and the
investigated skeletonisation methods and present our evaluation criteria.
8.3.1 Selection of parameters
For the selection of suitable values for the parameters defined in Subsection 8.2.2, we
evaluate the pre-processing method on a reduced dataset consisting of 150 representative
shapes from the entire dataset. In the absence of a ground truth of pre-processed shapes,
we face difficulties in the selection of suitable parameter values. Hence we choose a
heuristic approach to estimate robust parameters for the proposed method.
To determine the area opening and closing threshold t
aoc
[Vin94], we examine the
histogram of all blob sizes in the reduced dataset. The histogram is strongly skewed to
the left and exhibits a peak between blob sizes of 5 and 20 pixels. Blobs of this size
mostly represent noise (e.g. small holes) that usually does not contribute to a figure’s
shape significantly. With a threshold t
aoc
between 5 and 20 pixels about 20% of all blobs
are removed and a large portion of noise is filtered out. For the remaining experiments
with the entire dataset, we set t
aoc
to 10 pixels. For the size of the dilation disk r
dil
we select a rather low value of 3 pixels. This facilitates the reconnection of nearby
parts and minimizes the amount of region growing so that the likelihood of merging
unrelated shape parts is reduced. From the reduced dataset we observe that typical
plateau lengths of the indicator functions are between 2 and 5 iterations. A value of
n
it
= 3 iterations yields a good tradeoff between the sufficient smoothing of perturbed
shapes, and the decomposition of thin and sparsely connected figures. To estimate
s
conv
, we apply thinning to the shapes in the reduced dataset and count the number
of endpoints. We repeat this process with increasing values of s
conv
. For filter sizes
s
conv
between 30 and 70 contour points the number of the endpoints remains mostly
constant. This shows that the parameter has low sensitivity. We set s
conv
to 51 pixels
for our experiments. The size of the median filter is a non-critical parameter as it is
simply increased every iteration. We start with a minimum size of s
med
= 3 pixels and
increase it by 2 pixels every iteration.
Additional parameters have to be selected for the skeletonisation methods. For DCE
[BLL07], we estimate the parameter for the number of vertices adaptively by counting
the number of endpoints of the respective thinning skeleton. For the BPR-algorithm
[SBH+11] and the SPT -algorithm [SBYL13], we take the parameter values as proposed
by their authors. The computation of the thinning skeleton is parameter free as it is a
simple morphological operation.
131
8. SKELETONISATION OF PETROGLYPH SHAPES
8.3.2 Evaluation
Due to the absence of ground truth shapes and skeletons, we define several perceptual
evaluation measures that can easily be judged by a human observer. Subsequently, we
evaluate our pre-processing method and the applied skeletonisation algorithms sepa-
rately on the entire dataset.
For the evaluation of the pre-processing the following criteria are considered:
• All shape parts that are important for visual perception are preserved.
• Independent but closely spaced parts are not merged (e.g. legs or feathers).
• Small holes in the shape are closed, disjoint parts are reconnected and the contour
is properly smoothed, i.e. the shape is likely to facilitate subsequent skeletonisa-
tion.
For the evaluation of the obtained skeletons we apply the following criteria:
• The skeleton preserves the full structure of the shape.
• It exhibits branches for all important parts of the shape.
• It does not have remaining spurious branches.
8.4 Results
First, we evaluate the effect of the proposed pre-processing method quantitatively. The
intuition behind this evaluation is to obtain an indication of how effectively the pre-
processing simplifies the shapes (and thus improves them for subsequent skeletonisa-
tion). To assess this ability, we utilize a simple observation, namely that typical pet-
roglyph skeletons should usually not exhibit more than 20 endpoints. This can be
derived by looking at the shapes in Figure 8.3. A consequence of this observation is
that skeletons with more than 20 endpoints are likely to contain spurious branches.
We use the number of endpoints as a simple heuristic to evaluate the performance of
the pre-processing method. For evaluation we count the number of endpoints of the
thinning skeletons before and after pre-processing. Figure 8.7 shows the distribution of
the number of endpoints before and after pre-processing. Pre-processing strongly re-
duces the number of endpoints in the resulting skeletons and maps them to a reasonable
range. Thus this evaluation provides a first indication that the proposed pre-processing
is beneficial for subsequent skeletonisation.
As the employed shape pre-processing operations change the original shapes, for
a more comprehensive (qualitative) evaluation of the pre-processing methods a visual
inspection of the generated shapes and skeletons is necessary. For this purpose, we man-
ually judge each automatically pre-processed shape according to the criteria presented
in Subsection 8.3.2. The evaluation results for the proposed pre-processing method on
the entire dataset are shown in Table 8.2.
The pre-processing method performs well for 79.8% of all figures. Figure 8.8(a)
shows examples where unimportant holes in the shapes are removed successfully and
132
8.4 Results
Figure 8.7: Comparison of histograms of the number of endpoints of the thinning skele-
tons of unprocessed shapes (left) with the pre-processed shapes (right). The number of
endpoints significantly decreases and is mapped to a reasonable range.
Table 8.2: Results of the quantitative evaluation of the proposed pre-processing method
on the entire dataset.
Not smooth enough Merged parts Lost details
P
errors
9.4% 6.9% 3.9% 20.2%
the contours are properly smoothed. For the remaining 20.2% of the shapes we identify
different types of errors (see Figure 8.8(b) and Table 8.2). 9.4% of all shapes are not
smoothed sufficiently and their contours remain perturbed, which impedes skeletoni-
sation. For 6.9% of the shapes closely spaced parts are merged although it would be
important for skeletonisation that they remain separated. Only 3.9% of the shapes are
smoothed too strongly and thus important shape parts are lost. The proposed method
thus forms a tradeoff between the filtering of noise and removal of unimportant shape
parts and the smoothing of important shape parts.
Similarily to the evaluation of the pre-processing, we assessed the quality of the
skeletonisation methods according to the criteria presented in Subsection 8.3.2. The
performance evaluation of the skeletonisation methods on the pre-processed shapes for
the entire dataset is shown in Table 8.3. On average (over all skeletonisation algorithms)
we obtained satisfactory results for the majority of our shapes. Figure 8.9 provides
examples of the obtained skeletons and demonstrates that the skeletons are accurate
to a high degree. We comprehensively investigated different types of errors that arise
during skeletonisation and observed that all skeletonisation algorithms have deficiencies
in certain situations, but none of them completely fails. See Figure 8.10 for examples
of improper skeletonisation.
The skeletons pruned with DCE are in most cases complete (96.3%), but the algo-
rithm additionally produces a lot of spurious branches (47.7%). This is related to the
fact that the algorithm requires prior knowledge about the shapes (the number of DCE
133
8. SKELETONISATION OF PETROGLYPH SHAPES
(a) Holes are filled, contours are smoothed, and
the overall shape is preserved (middle).
(b) Figures are not smoothed sufficiently (top),
shape parts are merged (middle), and details are
lost (bottom).
Figure 8.8: Comparison of the results of satisfactory (a) and weak (b) pre-processing of
petroglyph shapes.
134
8.4 Results
Figure 8.9: Examples of successful skeletonisation with all selected and improved algo-
rithms based on shape pre-processing with the proposed method. Note that for DCE and
SPT a few spurious branches remain in contrast to thinning and BPR.
135
8. SKELETONISATION OF PETROGLYPH SHAPES
Figure 8.10: Examples of partly erronous skeletons obtained with the selected algorithms.
In all depicted examples spurious branches exist (yellow circles) and for some shapes im-
portant parts are deleted (2nd and 3rd row, red circles).
vertices). If this number is set too low, significant branches are deleted even before
spurious branches (3.7% of all shapes). If the number is set too high, a lot of spurious
branches remain (47.7%). We set this parameter adaptively (depending on the num-
ber of endpoints of the corresponding thinning skeleton) since a unique number that is
suitable for all shapes does not exist.
SPT builds upon DCE and performs similarly. Although it produces even more
spurious branches (65.8% of all shapes), it deletes spurious branches before important
ones. Thus for only 1.4% of all shapes important parts are lost. An advantage of SPT
over DCE is that it does not require a priori information about the shapes.
The BPR algorithm outperforms DCE and SPT and produces satisfactory skeletons
for 86.9% of all shapes. BPR generates much fewer spurious branches than DCE and
SPT (only 6.7% of all shapes). The BPR pruning is, however, in some situations too
strong and thus important branches are removed in 6.4% of all shapes (see for example
tail of the bird in Figure 8.10, 2nd row).
A simple thinning results in notably good skeletons for 83.5% of all shapes which
is nearly as good as the performance of the more sophisticated BPR algorithm. Since
petroglyph shapes often resemble stick-like figures, they can be well modeled by the
thinning algorithm. Additionally, the contour smoothing in the pre-processing avoids
the generation of spurious branches by thinning (in 87.5% of all cases). The results ob-
tained for thinning show that a proper pre-processing can replace an additional skeleton
pruning. In only 4.0% of all cases important branches are missed by thinning.
136
8.5 Conclusion
Table 8.3: Evaluation of the four selected skeletonisation algorithms on the entire dataset.
BPR and thinning perform best on the pre-processed shapes.
Problem >
Algorithm _ Spurious branches Lost parts
P
errors
Thinning 12.5% 4.0% 16.5%
BPR [SBH+11] 6.7% 6.4% 13.1%
DCE [BLL07] 47.7% 3.7% 51.4%
SPT [SBYL13] 65.8% 1.4% 67.2%
8.5 Conclusion
In this chapter, we presented a study on skeletonisation of petroglyph shapes. We stud-
ied the applicability of existing skeletonisation methods and evaluated their strengths
and weaknesses. Existing skeletonisation methods were developed and evaluated mainly
on ideal shapes and are thus not directly applicable to our real-world data. Therefore we
improved several skeletonisation algorithms to compensate for their shortcomings that
became apparent. Additionally, we proposed an adaptive shape pre-processing method
that enables the computation of robust skeletons for the complex and diverse shapes
under investigation. We performed a large-scale experiment and showed that proper
pre-processing is crucial for the skeletonisation of petroglyph shapes. Experiments on
skeletonisation showed that pre-processing in combination with a simple thinning yields
a good tradeoff for robust skeletonisation, whereas more sophisticated skeletonisation
techniques either generate more spurious branches (DCE, SPT) or delete important
ones (BPR). Our experiments clearly demonstrated that the presented pre-processing
method and the proposed improvements of recent skeletonisation methods solve the
additional challenges introduced by our complex and noisy real-world shape data for
more than 86% of all investigated shapes.
137
8. SKELETONISATION OF PETROGLYPH SHAPES
138
Automated Classification of Petroglyph Shapes
Figure 9.1: A query petroglyph (left) and the most similar petroglyphs determined by
our system.
In this chapter we investigate three questions. First, whether our approach lined out in
Chapter 7 generalizes to a larger dataset; second, how the employed descriptors and
descriptor combinations perform in detail as well as in comparison to popular state-of-
the-art methods for image classification (deep learning) and third, whether the results
can be incorporated into the workflow of petroglyph documentation.
In Section 9.1, we describe our automated classification approach and introduce
related work not mentioned in previous chapters. In Subsection 9.2.1, we evaluate the
petroglyph descriptor combination (see Figure 9.1) proposed in Chapter 7 on a larger
dataset. Subsection 9.2.2 presents details on the performance of single descriptors and
compares to a state-of-the-art visual deep learning framework. Section 9.3 presents some
thoughts about the value-in-use of our tools while section 9.4 presents our conclusion.
139
9. AUTOMATED CLASSIFICATION OF PETROGLYPH SHAPES
9.1 Approach and Related Work
For the experimental evaluation in this chapter, we use the approach proposed in Chap-
ter 7. In this section we describe the most important points. For a detailed description
of the approach and detailed evaluation results on a small petroglyph dataset please
refer to Chapter 7.
Numerous surveys about shape analysis have been published, comprehensive and im-
portant in this field are the surveys by [Pav78], [Lon98], [ZL04] as well as by [YKR08].
Classifications and taxonomies of shape descriptors have been proposed in the mentioned
surveys in different variants. A widely used common denominator is the distinction be-
tween contour-based and region-based descriptors. Our intuition lined out in Chapter
7 was that the petroglyph shapes in our dataset are perceptually close to skeletons.
Hence, we added a third category of skeleton-based approaches and categorized the de-
scribed approaches in region-based, contour-based and skeleton-based approaches that
are relevant for our work (see Section 7.1).
In Chapter 7, we combined the strengths of contour-based as well as region-based
shape descriptors with the strengths of undirected skeletal graphs for the petroglyph
classification problem. The methods we combined are a) Shape Context (SC) [BMP02],
which has been used by Deufemia et al. [DP13] for petroglyph classification, b) Inner
Distance Shape Context (IDSC) [LJ07], c) Generalised Hough Transform (GHT) pro-
posed by Zhu et al. [ZWKL11] for petroglyph classification.1 as well as d) Our own
petrogylph descriptor which makes use of topological features of the undirected skeletal
graphs of petroglyphs in order to create a feature vector for each petroglyph by graph
embedding (GE).2 In order to be able to use skeletonisation and contour-based descrip-
tors, we needed to pre-process our material to achieve continuous boundaries. This
pre-processing is described in Chapter 8. The resulting shape images are the input for
SC, IDSC as well as for the skeletonisation for which we use the method of [BLL07].3
We normalize the width of the input petroglyph to 500px and prune the skeletons by
joining nodes which have a spatial distance smaller than a given threshold. GHT is
computed on the original petroglyph images.4
In this chapter, we build on the experiments in Chapter 7 where it turned out that a)
GE and GED perform comparably well (with way lower computational demand for GE)
and b) the combination of region-based, contour-based and skeleton-based descriptors
yields the best results.
We employ a larger dataset (see Subsection 9.2.1) and additionally investigate the
region-based Generic Fourier Descriptor (GFD, [ZL02]) Furthermore, we use a state-of-
the-art deep learning framework which we tune for petroglyph shape shape recognition
(see Subsection 9.2.2).
1For all three descriptors, implementations have been kindly provided by the respective authors.
2We utilize the MIT strategic engineering tools for network analysis kindly provided by [BdW12]:
http://strategic.mit.edu/downloads.php?page=matlab_networks.
3We thank Bai et al. for providing the implementation.
4We use resolutions of 10x10px, 20x20px and 30x30px. We report on the best performing resolution
10x10px.
140
9.2 Experiments and Results
GFD has been employed by Deufemia and Paolino [DP14]. They use an enhanced
version of Generic Fourier Descriptors (EGFD) to detect and classify petroglyphs from
full scenes. In their experiments on 1215 petroglyphs in 53 scenes EGFD outperforms
the descriptors GHT and SC. They do not use combinations of descriptors.
We use the popular deep learning framework CAFFE by Jia et al. [JSD+14] twofold:
First, we use the features extracted with the pre-trained imagenet network that is
included in the distribution of CAFFE 1 (CNN_IN). Second, we tune the network by
re-training the last layer of the network with our petroglyph shape data (CNN_RT).
The re-training of a convolutional neural network is a computationally demanding task.
We used leave-one-out cross-validation (LOOCV) in Chapter 7. This poses a dilemma.
On one hand, we need to present results comparable to Chapter 7 and consequently
have to use LOOCV. On the other hand, LOOCV requires the training of a net for each
training data sample, which is in our case (1255 samples) computationally not feasible.
Hence, we split the evaluation in two parts. In both parts, we use a combination of
several shape descriptors and a k-NN classifier.
The results we report on in Subsection 9.2.1 aim at evaluating the performance and
value-in-use of the descriptor combination proposed in Chapter 7. Thus, we validate
with LOOCV to preserve backwards comparability to Chapter 7.
The detailed results in Subsection 9.2.2 extend the evaluation with GFD and deep
learning, including a re-trained convolutional neural network (CNN_RT). As it is com-
putationally not feasible to train a net for each training data sample (1255 samples)
we employ 5-fold cross-validation in this section and repeat selected experiments from
Subsection 9.2.1 with this cross-validation method for comparability.
9.2 Experiments and Results
9.2.1 Descriptor Combination Previously Proposed
In this section we apply the approach outlined in Subsection 7.2.2 to the large dataset
described in Subsection 2.6.2. We investigate whether the good result - an accuracy of
88% - which was obtained on a small evaluation dataset in Chapter 7, can be generalised
to a larger dataset. We analyze the data separately according to the two typologies that
we have employed.
9.2.1.1 Modified Sansoni Typology
Figure 9.2 shows the classification results for the 26 classes examined based on the
modified Sansoni typology. The overall accuracy is 65%. We observe that classes 1,2,7
and 14,15,16 yield clearly better accuracies (around 80%) than the overall accuracy.
Table 2.1 shows that classes 1,2,7 and 14 are heavily populated, but 15 and 16 are only
averagely populated with 35 and 43 examples. It is possible that classes 15 and 16
are visually homogenous and especially well suited for our classifier. Figure 9.4 shows
1
http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
141
9. AUTOMATED CLASSIFICATION OF PETROGLYPH SHAPES
0!
10!
20!
30!
40!
50!
60!
70!
80!
90!
100!
1! 2! 3! 4! 5! 6! 7! 8! 9!10!11!12!13!14!15!16!17!18!19!20!21!22!23!24!25!26!
Figure 9.2: Classification accuracy of the classification results of the 26 classes based on
the modified Sansoni typology. The red line shows the overall accuracy. Please refer to
Table 2.1 for the class descriptions.
examples of class 16. We observe that the figures do, indeed, share a high degree of
visual similarity. Note that our classifier is robust against changes of rotation and scale
and so is relatively unaffected by differences of these types.
Furthermore we observe in Figure 9.2 that several classes (3, 4, 8 and 21) have an
accuracy of 0%, i.e. not a single example of these classes has been correctly classified by
our approach. Table 2.1 shows that classes 1 to 4 are variants of armed figures holding
different weapons. The classes are unequally populated. Classes 1 and 2 contain 61
and 105 examples, classes 3 and 4 contain only 10 examples each. Figure 9.3 displays
the confusion matrix of the classification results. Classification accuracy is good for
classes 1 and 2 where there are sufficient examples for training. We observe that class
3 (which is shown in row 3) has the majority of its misclassifications being in class
2. Many examples of class 4 are also misclassified into class 2. We assume that the
poor classification performance is caused by the small amount of training material for
classes 3 and 4 and the fact that the visual differences between these armed figures are
not sufficient for our descriptors. Class 8 contains 21 “Unarmed anthropomorph figures
- not elsewhere classified”. The majority of these figures is misclassified into class 2
as well. It would seem that class 8, this collection of unarmed anthropomorph figures
which do not fit elsewhere, is visually too heterogeneous. It is probable that this causes
the poor classification performance for class 21 as well, as this class also contains a
variety of otherwise unclassified figures. Figure 9.5 shows examples from class 21. We
142
9.2 Experiments and Results
Figure 9.3: Heatmap of the confusion matrix of the classification results of the 26 classes
based on the modified Sansoni typology. Each line in the matrix represents the classifica-
tions of a certain class. Please refer to Table 2.1 for the class descriptions. See Table 9.1
for percentage values.
Figure 9.4: Examples of modified Sansoni typology class 16 - Axe.
143
9. AUTOMATED CLASSIFICATION OF PETROGLYPH SHAPES
Table 9.1: Confusion matrix of the classification results of the 26 classes based on the
modified Sansoni typology. Each line in the matrix states the percentage of the classifica-
tions of a certain class. Please refer to Table 2.1 for the class descriptions. The sum of
the values of each line may differ from 100 due to the rounding of every single percentage
value.
Classes Predicted Labels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 80 13 0 0 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0
2 3 83 0 0 0 0 6 0 0 0 0 0 0 7 1 0 0 0 0 0 0 0 1 0 0 0
3 0 70 0 0 0 0 10 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0
4 10 30 0 0 10 0 0 20 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 0
5 5 5 0 0 67 3 8 0 0 5 0 3 0 3 0 0 0 0 0 3 0 0 0 0 0 0
6 0 0 0 0 22 22 0 0 0 11 0 22 11 11 0 0 0 0 0 0 0 0 0 0 0 0
7 6 6 0 0 0 0 83 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
8 5 52 0 0 0 0 5 0 0 0 0 0 0 29 0 0 0 0 0 5 0 5 0 0 0 0
9 2 7 0 0 0 0 5 0 58 0 2 4 4 12 2 0 0 0 0 0 5 0 0 0 0 0
10 2 2 0 0 2 0 3 0 0 60 10 7 12 0 0 0 0 0 0 0 0 0 3 0 0 0
11 5 0 0 0 0 0 0 0 0 2 70 7 17 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 2 0 2 0 2 0 0 0 0 5 54 36 0 0 0 0 0 0 0 0 0 0 0 0 0
13 1 6 0 0 0 0 5 0 0 1 13 16 52 3 0 1 0 0 0 0 0 0 0 1 0 0
14 4 3 0 0 1 0 1 0 1 0 1 1 0 87 0 1 1 0 0 0 0 0 1 0 0 0
15 0 3 0 0 0 0 14 0 0 0 0 0 0 6 77 0 0 0 0 0 0 0 0 0 0 0
16 2 0 0 0 0 0 5 0 0 0 0 0 2 2 5 81 0 0 0 0 0 0 0 0 2 0
17 9 0 0 0 0 0 9 0 9 0 9 0 0 18 0 0 36 0 0 0 0 0 0 0 9 0
18 0 0 0 0 0 0 0 0 0 0 0 7 0 7 7 7 0 40 33 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 7 7 0 0 0 14 64 0 0 7 0 0 0 0
20 0 10 0 0 0 0 0 0 0 0 0 0 0 30 0 0 0 0 0 60 0 0 0 0 0 0
21 11 0 0 0 0 0 0 0 33 0 0 0 0 33 11 0 0 0 11 0 0 0 0 0 0 0
22 10 8 0 0 0 0 10 0 3 0 0 5 3 3 8 3 0 0 3 0 0 46 0 0 0 0
23 3 0 0 0 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 3 0 3 53 10 20 0
24 11 11 0 0 0 0 5 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 37 26 0
25 3 0 0 0 0 0 0 0 3 0 0 0 0 8 0 0 0 0 0 0 0 0 6 22 58 0
26 5 20 0 0 0 0 0 0 0 0 0 0 0 25 5 0 0 0 0 0 0 0 0 0 5 40
Figure 9.5: Examples of modified Sansoni typology class 21 - Other unclassified.
144
9.2 Experiments and Results
0!
10!
20!
30!
40!
50!
60!
70!
80!
90!
100!
Bo
dy
 f
il
li
ng
!
Bo
dy
 f
or
m!
Fo
rm
 o
f 
he
ad
!
Ha
nd
s!
He
ld
 o
bj
ec
ts
 
do
mi
na
nt
 h
an
d!
He
ld
 o
bj
ec
ts
 
su
bo
rd
in
at
e 
ha
nd
!
Fe
et
!
Le
gs
!
Se
x 
or
ga
ns
!
Figure 9.6: Classification accuracy (blue) and baseline (black) for each anthropomorph
subdivision of the 3D-Pitoti project typology. Each subdivision reflects a different set of
classes (based on the divisions) for all anthropomorph figures. The baseline is the size of
the largest class of a subdivision, i.e. the performance of a naive classifier which always
predicts the class with the highest a priori probability.
observe that the visual similarity is very low.
9.2.1.2 3D-Pitoti Project Typology
As this typology uses subdivisions for the anthropomorph figures (see Subsection 2.6.2),
that each classify all anthropomorph figures by different criteria (i.e. any particular an-
thropomorph will be classified along multiple dimensions in each of the subdivision
classes), we first investigate these subdivisions. Figure 9.6 shows the classification per-
formance for each subdivision. We observe that the classification accuracies are rel-
atively high, in a range from 75 to 92%. But, as these subdivisions have only small
numbers of classes (from 2 to 7) with sometimes unevenly distributed numbers of ex-
amples, we have to consider the baseline classification performance, too. The baseline
is the size of the largest class of a subdivision, i.e. the performance of a naive classifier
which always predicts the class with the highest a priori probability. We observe that
for 6 of the 9 subdivisions a performance gain over the baseline can be yielded using
our approach. For the remaining 3 subdivisions, our approach yields an accuracy close
to the baseline accuracy.
The multiple labeling only applies to anthropomorph figures in the 3D-Pitoti project
typology. These are roughly a third of all figures in the dataset. The other two thirds
of the figures have a single class label only. Hence, we do not use a fuzzy approach
for classification, but intend to apply the classifiers for each subdivision sequentially on
145
9. AUTOMATED CLASSIFICATION OF PETROGLYPH SHAPES
0!
10!
20!
30!
40!
50!
60!
70!
80!
90!
100!
1! 2! 3! 4! 5! 6! 7! 8! 9! 10! 11! 12! 13! 14! 15! 16! 17! 18!
Figure 9.7: Classification accuracy of the classification results of 18 classes based on
the 3D-Pitoti project typology. The red line shows the overall accuracy. Please refer to
Table 2.2 for the class descriptions of classes 1-16. Classes 17 and 18 are the “body filling”
subdivisions of the anthropomorph class.
all anthropomorph candidates resulting from the experiment on the complete 3D-Pitoti
project typology. For this classification experiment we include the 2 classes of the best
performing subdivision “Body filling”. Hence, we have 18 classes in total. Figure 9.7
shows the classification results. The overall accuracy is 72%, which is clearly better
than the 65% of the modified Sansoni typology. We observe that the classes 6 (Cross)
and 9 (Shoe/Footprint) yield accuracies over 80%, while class 17 (Anthropomorph with
filled body) reaches over 90%. All these classes are highly populated. Class 6 has 118
examples, Class 9 has 104 examples and Class 17 has as many as 284 examples. Column
17 of the confusion matrix in Figure 9.8 shows that many examples of poorer performing
classes are falsely classified into class 17. Class 9 attracts a relatively small number of
false classifications. Perhaps, then, the examples of this well-performing class have a
high visual similarity that suits our classifier well. Figure 9.9 shows examples from class
9. In fact the examples are not visually similar, but they do have comparable contours.
Furthermore, we observe in Figure 9.7 that in contrast to the experiments on the
modified Sansoni typology none of the classes yields an accuracy of 0%. Class 2 (Ca-
munian rose, which is the second smallest class) yields the poorest performance - 20%.
According to the confusion matrix in Figure 9.8, the misclassifications are mainly into
146
9.2 Experiments and Results
Figure 9.8: Heatmap of the confusion matrix of the classification results of 18 classes based
on the 3D-Pitoti project typology. Each line in the matrix represents the classifications of
a certain class. Please refer to Table 2.2 for the class descriptions of classes 1-16. Classes
17 and 18 are the “body filling” subdivisions of the anthropomorph class. See Table 9.2 for
percentage values.
Figure 9.9: Examples of 3D-Pitoti project typology class 9 - Shoe/Footprint.
147
9. AUTOMATED CLASSIFICATION OF PETROGLYPH SHAPES
Table 9.2: Confusion matrix of the classification results of 18 classes based on the 3D-
Pitoti project typology. Each line in the matrix states the percentage of the classifications
of a certain class. Please refer to Table 2.2 for the class descriptions of classes 1-16. Classes
17 and 18 are the “body filling” subdivisions of the anthropomorph class. The sum of the
values of each line may differ from 100 due to the rounding of every single percentage value.
Classes Predicted Labels
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 36 0 18 0 0 9 0 0 0 0 0 9 0 0 9 0 18 0
2 0 20 0 0 0 30 0 0 0 10 0 0 0 10 0 0 30 0
3 4 2 73 0 0 2 2 2 11 2 2 0 0 0 0 0 2 0
4 0 0 15 25 0 5 0 0 5 10 0 0 0 0 0 0 40 0
5 0 0 50 0 29 0 0 0 0 7 0 0 0 0 0 0 14 0
6 1 1 4 0 0 83 0 0 1 3 0 1 0 0 1 0 6 0
7 0 0 5 0 3 8 44 13 10 0 3 0 0 3 0 0 13 0
8 3 0 0 0 0 0 3 71 0 9 0 0 0 0 0 0 14 0
9 3 0 2 0 1 0 0 1 86 2 2 0 0 0 0 1 2 0
10 1 0 3 0 0 2 0 0 5 78 1 0 0 0 0 0 11 0
11 2 0 2 0 0 5 0 5 5 0 79 0 0 0 0 0 2 0
12 0 0 2 0 0 0 0 0 2 0 0 68 0 7 0 18 3 0
13 0 0 0 0 0 2 0 0 3 3 0 2 52 8 0 15 13 2
14 0 0 2 0 0 0 0 0 0 0 0 4 0 51 0 36 7 0
15 0 0 2 0 0 2 0 2 4 5 2 2 0 0 56 4 19 4
16 0 0 1 0 0 2 0 0 2 1 0 18 2 12 0 47 13 1
17 0 0 0 0 0 1 0 0 2 2 0 0 0 0 0 0 91 3
18 0 0 0 0 0 4 0 0 4 6 0 0 0 0 0 0 23 62
Figure 9.10: Examples of 3D-Pitoti project typology class 2 - Camunian rose.
148
9.3 Use Scenario
class 6 (cross) and into class 17 (Anthropomorph with filled body), which is the most
numerous amongst our classes. Both classes are not visually similar to Class 2, for which
Figure 9.10 shows example class members. We assume that the poor performance is due
to the fact that the dots around the rose have a negative influence on the pre-processing
and skeletonisation of the figure (see Chapter 8).
9.2.2 Details on Single Shape Descriptors and CNNs
In this section we detail the results described in Subsection 9.2.1 with our 3D-Pitoti
Project Typology. We investigate the performance of single shape descriptors and sev-
eral combinations thereof. We add an additional region-based shape descriptor (GFD)
and use a state-of-the-art convolutional neural network implementation, in a first step
for feature extraction based on an existing network (CNN_IN) and in a second step for
re-training of the network with our petroglyph shape data (CNN_RT).
Figure 9.11 shows the results for the evaluation with single descriptors. We observe
that the best performing descriptor SC on the small dataset in Subsection 7.3.2 is out-
performed by CNN_RT which requires a time-consuming training of the convolutional
neural network. The computationally feasible use of a pre-trained network for feature
extraction (CNN_IN) yields clearly weaker results than SC. IDSC is behind but is still
in the top group of four strong descriptors. The region-based GFD and GHT as well as
our skeletal-based GE perform considerably worse than the top group. The performance
of the GE is in slight contrast to the performance achieved in Subsection 7.3.2 on the
small dataset. We assume that this is a result of the higher number of classes and the
more diverse shapes used now.
Figure 9.12 shows the performance of descriptors intermediately fused (see Subsec-
tion 7.2.4). We assume, that the results are well comparable with the results in the pre-
vious Subsection 9.2.1, where the LOOCV performance of the combination SC, IDSC,
GHT, GE on the used 3D-Pitoti -typology is at 72%, whereas the 5-fold cross-validated
performance at k=5 here yields slightly more 73%. We observe that this descriptor
combination which is performing best on the small dataset in Subsection 7.3.2 is clearly
outperformed by the combinations that use CNN-based features. Furthermore, we ob-
serve that the combination of descriptors clearly improves the performance from 74%
with a single descriptor to over 80%.
9.3 Use Scenario
In the previous section we demonstrated that the fully automated classification of pet-
roglyph shapes is feasible and generalises to our larger dataset. Conclusions will be
drawn in Section 9.4. In this section we want to look at the results of our approach
from a different perspective. A shape classification accuracy of 100% is unlikely to be
achieved with real-world data like our petroglyph dataset. Our results (65% and 72%
without the use of the re-trained CNN and over 80% with it) cannot be used directly as
final classification results but, rather, need manual inspection and, possibly, correction.
149
9. AUTOMATED CLASSIFICATION OF PETROGLYPH SHAPES
0!
10!
20!
30!
40!
50!
60!
70!
80!
1! 3! 5! 7! 9! 13!
k!
!
CNN_RT!
CNN_IN!
GE!
GHT!
GFD!
SC!
IDSC!
Figure 9.11: Performance of single descriptors in 5-fold cross-validated % of accuracy
over number of considered neighbors k. We observe that the best performing descriptor SC
(red / long line-dot) on the small dataset in Subsection 7.3.2 is outperformed by CNN_RT
(dark red / long line-short line).
150
9.3 Use Scenario
69!
71!
73!
75!
77!
79!
81!
83!
1! 3! 5! 7! 9! 13!
k!
SC,CNN_RT!
SC,IDSC,CNN_RT!
SC,IDSC,CNN_RT 
GFD!
SC,IDSC,CNN_RT
,GFD,GE!
SC,IDSC,CNN_RT
,GFD,GE,GHT!
SC,IDSC,CNN_RT
,GFD,GE,GHT,CN
N_IN!
SC,IDSC,CNN_RT
,CNN_IN!
SC,IDSC,GHT,GE!
Figure 9.12: Performance of descriptors intermediately fused (see Subsection 7.2.4) in
5-fold cross-validated % of accuracy over number of considered neighbors k. We observe
that the combination SC, IDSC, GHT, GE (red / long line-dot) which is performing best
on the small dataset in Subsection 7.3.2 and evaluated on two typologies in the previous
Section 9.2.1 is clearly outperformed by the combinations that use CNN-based features.
151
9. AUTOMATED CLASSIFICATION OF PETROGLYPH SHAPES
The question we want to answer now is whether these approaches and results can be
applied in a way that usefully supports the work of rock-art researchers. The annota-
tion tool described in Subsection 2.6.3 is a good starting point. Initially this tool was
developed to create a dataset and to provide a gold standard for the investigation of
petroglyph similarity. However, during the annotation process the tool was found to
be of independent value for the archiving, classification and analysis of tracings. The
next question is whether - and, if so, how - the automatic classification with its current
performance could be included into the manual annotation workflow. For example, we
could automatically fill in the predicted class in the drop-downs used for classifying a
figure (see Figures 2.21 and 2.25). We could expect correctly pre-selected drop-downs
in 65% (modified Sansoni typology), 72% (3D-Pitoti project typology) and 80% (3D-
Pitoti project typology with re-trained CNN) of the cases. We could extend this if we
not only present a single prediction (the prediction with the highest rank), but if we
also present the predictions with lower ranks. Figure 9.13 shows how probable it is
that the correct class of a given figure is within the first n items in a ranked list of
predicted class candidates. We observe that obviously the values increase the longer
the list is. As we want to present the list in an appropriate form in the user interface of
the annotation tool, it is advisable to limit it to 5 or 7 items at most. For the 3D-Pitoti
project typology, a list size of 5 items yields a performance of 92% (and even 97% with
the re-trained CNN) for this semi-automatic classification. For the modified Sansoni
typology, the performance is 89%.
From the user perspective this means that around 90% of all classifications of figures
can be achieved with a single click, as in all these cases our approach shows the correct
class within the five items long ranked list. Furthermore, the sorting of the list also
helps the user in making the selection as the highest ranked class candidate is at the
top, and more than 80% of the correct classes are within the first three items on the
list (see Figure 9.13).
9.4 Conclusion
In this chapter we tested an approach for automated classification of petroglyph shapes
on a large real world dataset. Employing a custom-made tool the dataset has been fully
expert-annotated according two different typologies containing 18 (3D-Pitoti project)
and 26 (modified Sansoni) classes. We have demonstrated, that our previously investi-
gated approach (see Chapter 7) generalises well to a large real world dataset. The overall
cross-validated classification accuracies for the two typologies are 72% (even 80% with
the re-trained CNN) and 65%, respectively. One general reason for misclassifications
is probably the large variation in class sizes. For the 3D-Pitoti project typology, the
class sizes vary from 10 to 335, for the modified Sansoni typology from 10 to 143. Weak
performing classes are often small classes. Other reasons for weak classification results
would appear to be non-homogeneous classes in terms of their members’ visual simi-
larity and, perhaps, as-yet-undiscovered issues in our pre-processing pipeline. We have
described applications of our annotation tool and demonstrated that the results conse-
152
9.4 Conclusion
80!
82!
84!
86!
88!
90!
92!
94!
96!
98!
100!
3! 5! 7! 9!
Figure 9.13: Probabilities (y-axis), that the correct class is within the n (x-axis) first
predictions of a ranked list of predicted classes. The (red / short line) line shows results for
the 3D-Pitoti project typology and the (blue / line-dot) line shows results for the modified
Sansoni typology with the descriptor combination performing best on the small dataset in
Subsection 7.3.2 and evaluated on two typologies in Subsection 9.2.1. The (green / long
line) shows the results for the 3D-Pitoti project typology with the descriptor combination
including a CNN classifier (SC, IDSC, CNN_RT; CNN_IN) evaluated in Subsection 9.2.2.
quent upon the application of our classification algorithms to the data developed from
said annotation tool can clearly enhance and make more productive the classification
work of a rock art researcher.
153
9. AUTOMATED CLASSIFICATION OF PETROGLYPH SHAPES
154
Conclusion and Future Work
Figure 10.1: A petroglyph depicting a human figure in Valcamonica, Italy.
In this thesis, we have developed and described fundamental methods with the aim to
enable a digital toolchain for rock art research based on 3D scans. Figure 10.1 shows
an example of the artwork. In this chapterwe summarise the main technical findings,
discuss the potential use of our findings in rock art research and suggest future work.
155
10. CONCLUSION AND FUTURE WORK
10.1 Conclusion
In the Introduction we defined three ultimate goals for the computational analysis of
petroglyphs (see page 2). With the vision of large archives full of rock art images and
3D scans we aim at robust automated methods to
(a) determine the exact shapes and spatial positions of petroglyphs in images or 3D
scans of full rock panels;
(b) classify the petroglyphs regarding their shapes and pecking styles and
(c) retrieve similar petroglyphs from different archives of petroglyph images and 3D
scans.
In this thesis, we addressed goal (a) and partly goal (b). We developed methods to
segment images and 3D scans of rock panels containing petroglyphs. We approached the
classification of petroglyphs regarding their shape. Lacking ground truth for pecking
style classes, we couldn’t work on the classification of pecking styles. We only had
access to material from one specific rock art site, thus we couldn’t approach goal (c).
The chapters of the thesis follow the classic workflow of petroglyph documentation. In
Chapters 4 and 5, we presented methods to segment the rock surface in pecked areas
and natural rock surface. In Chapter 6, we investigated the pecked regions aiming at
automatic distinction of different pecking styles based on their similarity. In Chapters
7 and 8, we approached the numeric description of the shape of petroglyphs based
on region, boundary and skeleton to allow similarity estimation of two petroglyphs. In
Chapter 9 we showed that the similarity estimation of two petroglyphs allows automated
classification in pre-defined classes. The most important findings of our research are:
• Segmentation of rock surface images in pecked areas and natural rock surfaces is
feasible based on digital photos. Complementary information of the best perform-
ing different texture features is preserved optimally with late fusion.
• Segmentation of rock surfaces based on 3D point clouds outperforms the segmen-
tation based on digital photos. We assumed this result, as information about
the micro-topography of the surface which is captured in 3D allows illumination-
independent texture description and classification.
• For the segmentation task, 3D descriptors based directly on the surfels (3D points
and normals) are outperformed by the employment of 2D visual features on depth
maps of the surface which is surprising at first glance. However, we observe that
the transfer of the surface’s micro-topography to a depth map is possible without
any information loss, given sufficient resolution of the depth map and the fact
that our surfaces do not contain any self-occlusions. Additionally, the image-
space approach can build on a long line of work on 2D visual features.
• 3D descriptors can model different pecking styles well in many cases but fail in
others. This task was especially demanding as there is no ground truth for different
pecking styles.
156
10.2 Future Work
• The classification of petroglyphs with respect to their shape is feasible. We have
shown, that the combination of region-based, boundary-based and skeletal de-
scriptors yields the best results.
• The features learned by a convolutional neural network (CNN) trained with our
data outperform all other shape descriptors we used. But, we show that they con-
tain complementary information which allows to increase classification accuracy
when combining the traditional shape descriptors with the CNN features.
For rock art researchers our findings can be useful in several ways. First of all, clas-
sic contact tracing to document rock art is a time-consuming task. Additionally the
documentation of the pecking style by drawing a point for every single peck mark is
highly dependent on the documenting person. Our segmentation method based on 3D
scans produces a documentation good enough to require only little corrections by the
rock art researcher. Clearly different pecking styles can be separated by the method
we propose and together with the 3D material, the rock art researcher can spend more
time for interpretation than for documentation. The classification work can clearly be
improved as our approach allows a one-click classification for more than 99% of the
cases. Given the high number of petroglyphs (more than 300.000 alone in Valcamonica)
the automated classification makes the work of rock art researchers more productive.
Simple tools can be of use, too. It turned out, that the annotation tool we developed
in order to acquire ground truth information for shape classification is still used by the
involved rock art researchers even almost two years after the end of the ground truth
campaign. They see the value of the tool for organising the large number of classic
tracings they work with. We conclude that our methods can improve the work of rock
art researchers.
10.2 Future Work
We line out future work in the rock art domain and beyond.
Future work in the rock art domain: We have shown that for petroglyphs 3D surface
segmentation clearly outperforms 2D segmentation. 3D surface segmentation requires
high-resolution 3D scans. Currently, scans in the required quality cannot be acquired
with off-the-shelf hardware but require expensive special hardware and in many cases
also post-processing of the captured data. In contrast to that, 2D images can be taken
with standard digital cameras. Hence, in cases where 3D scans are not available, 2D
segmentation can be an important asset to substitute or at least support the time-
consuming tracing. Thus, we see future work in the extension of our 2D approach
described in Chapter 4. A larger dataset and the employment of deep learning could
improve results.
Our investigation of pecking styles was limited by the fact, that no ground truth for
pecking styles exists. Hence, we focused on unsupervised methods. Future work could
use a semi-supervised approach, where the user provides hints to a machine learning
system. In this way expert knowledge can be integrated and a suitable set of parameters
157
10. CONCLUSION AND FUTURE WORK
for a given pecking style can be selected in a supervised fashion. This could facilitate the
generation of computational models for different pecking styles. The usage of automatic
feature selection during model learning (e.g. by sparse SVM or Random Forests) would
enable to take the complete set of investigated parameters into account which in turn
could maximize the number of pecking style attributes that can be considered in the
analysis.
Future work beyond the rock art domain: Our results and findings are relevant be-
yond the scope of rock art. The most important point deals with the micro-topography
of surfaces. We assume that the traditional classification of 2D images of surfaces based
on the visual appearance (color and texture) will be superseded by surface classification
based on a combination of the tactile appearance and color. The tactile appearance
can only be captured with highly detailed 3D scans. Our methods for the distinctive
numeric description of the micro-topography of surfaces can be adapted for any kind of
surface. As existing 3D descriptors are outperformed by traditional 2D visual features
employed on depth maps of the surfaces, future work is necessary for descriptions based
directly on point cloud data exploiting the surfels. To decrease computational demand,
the surface description can be performed only around salient points. For this, general
purpose definitions and detection methods for the saliency of 3D points beyond the
peaks and valleys we use are required.
Our investigations on shape similarity have demonstrated that a combination of
descriptors that consider boundary, region or skeleton yields very good results. This is
of general interest for shape recognition and can - together with our real-world shape
dataset - foster new research on shape description. Future work could be done on the
evaluation of our descriptor combinations on different shape datasets.
We have demonstrated that our methods can improve the work of rock art researchers
and we have shown in our evaluations that we clearly advanced the state-of-the-art.
We have argued that our methods are of use beyond rock art in different application
domains. We have suggested future work in the rock art domain and beyond. Given
the rising availability of 3D scans we expect (and hope!) that our findings will be useful
for different applications and will contribute to further methodological development.
158
List of Figures
1.1 A petroglyph depicting a warrior in Valcamonica, Italy. . . . . . . . . . . 1
1.2 Manual contact tracing in Valcamonica, Italy. c Alberto Marretta, used
with permission. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 blabla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Simplified system workflow of the complete 3D-Pitoti pipeline. The scan-
ning was carried out as multi-scale approach based on the 3D-reconstruction
method Structure-from-Motion (SFM). The massive amount of multi-
scale 3D-data required Level-of-Detail (LOD) algorithms for interactive
visualisation. The contributions of this thesis (see Section 1.1) are sub-
sumable under “Intelligent Processing”. We thank Alex Kulik for the
preparation of the first version of this figure. . . . . . . . . . . . . . . . . 5
2.1 A petroglyph scene in Valcamonica, Italy. . . . . . . . . . . . . . . . . . 9
2.2 Cave painting in the Lascaux cave in France. . . . . . . . . . . . . . . . 11
2.3 A petroglyph depicting a Bubalus in the Sahara in southern Algeria. . 12
2.4 Intaglio showing a colibri in Nazca, Peru. . . . . . . . . . . . . . . . . . 13
2.5 Result of a rubbing documentation. The rubbing shows a rock panel in
Tanum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 The process of rubbing on a rock panel in Tanum. . . . . . . . . . . . . 15
2.7 Tracing on a rock in Valcamonica in Italy. . . . . . . . . . . . . . . . . . 16
2.8 Tracing on a rock in Valcamonica in Italy. . . . . . . . . . . . . . . . . . 18
2.9 High quality 2D images as used in 2D segmentation. Shadows of trees
surrounding the rock are visible. . . . . . . . . . . . . . . . . . . . . . . 21
2.10 A high quality 2D image as used in 2D segmentation. . . . . . . . . . . . 22
2.11 A 3D input point cloud (a) and a close up of the head of the Pitoto (b).
The point cloud has 4.4M points. The close up clearly shows the surface
morphology of the pecked regions (e.g. in the head and torso of the Pitoto). 23
2.12 An input point cloud (a) with its orthophoto (b) and depth map (c).
Brighter values in the depth map indicate higher depth values. . . . . . 25
2.13 Segmentation ground truth annotation on the 2D image in Photoshop. . 26
159
LIST OF FIGURES
2.14 Orthophotos (a and c) and their corresponding ground truth images (b
and d). White areas mark pecked regions in the orthophoto and black
areas unpecked regions. The red area surrounding the scan is the area
where no information is available. . . . . . . . . . . . . . . . . . . . . . . 27
2.15 Petroglyph shapes from our database. The underlying tracings are c CCSP
and Alberto Marretta, used with permission. . . . . . . . . . . . . . . . 28
2.16 Overview of the ground truth annotation web tool interface. . . . . . . . 32
2.17 View of a figure in its rock panel tracing. . . . . . . . . . . . . . . . . . . 32
2.18 Use of the polygon selection tool. . . . . . . . . . . . . . . . . . . . . . . 33
2.19 Use of the magic wand selection tool. . . . . . . . . . . . . . . . . . . . . 33
2.20 Comparison of selection without and with filiform selection mode. . . . . 34
2.21 Classification process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.22 Special cases when selecting a figure. . . . . . . . . . . . . . . . . . . . . 36
2.23 Statistics of numbers of figures in each class. Please note, that the screen-
shot was taken at a later date than our experiments. As annotation is
an ongoing process, the numbers are higher. . . . . . . . . . . . . . . . . 37
2.24 All figures for class key. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.25 Details for one key figure. . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 A petroglyph depicting a deer in Valcamonica, Italy. . . . . . . . . . . . 41
3.2 blabla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 bla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Scales of acquisition for the holistic approach of capturing a complete
valley in the 3D-Pitoti project. . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 A human figure and its automatic segmentation result. . . . . . . . . . . 49
4.2 Detail of rock art material. . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Schema of different fusion strategies applied. . . . . . . . . . . . . . . . . 52
4.4 Small details of our test images (see Figures 2.9(a) and 2.10). Problems
of the material include grass (tl), shadows (tr), scratches (bl) and cracks
(br). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 F1- results of single feature classifiers. . . . . . . . . . . . . . . . . . . . 56
4.6 F1- results of feature combination with early fusion. The red line marks
the result of the best single feature. The corresponding feature combina-
tions are denoted in Table 4.2. . . . . . . . . . . . . . . . . . . . . . . . . 57
4.7 F1- results of feature combination with late fusion. The red line marks
the result of the best single feature. The corresponding feature combina-
tions are denoted in Table 4.3. . . . . . . . . . . . . . . . . . . . . . . . . 58
4.8 Two details of our test material with a visualization of late fusion run Y
with thresholded majority voting. The grey value of a pixel corresponds
to the classification confidence of a pixel. Darker means more confident.
We observe that some blurring happens, e.g. at the boundaries of the
shield in the upper right. We assume that the blurring is due to the size
of the local window we use for feature extraction. . . . . . . . . . . . . 61
160
LIST OF FIGURES
4.9 Precision and recall for late fusion with thresholded majority voting. The
threshold ranges from two to the number of fused classifiers minus one.
The corresponding feature combinations are denoted in Table 4.3. . . . . 62
4.10 Visualization of two details of our test material classified with different
thresholds used in late fusion run Y. Red dots are classified as back-
ground. Each column shows the result of a specific threshold t. The most
left column corresponds to t=12, i.e. all fused classifiers must agree. The
most right column corresponds to t=1, i.e. at least one classifier must
classify a pixel as foreground pixel. We observe that the ideal threshold
for segmentation varies. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.11 Our test material overlayed with the best classification result (late fusion
run X). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1 A petroglyph scene with overlayed segmentation ground truth. . . . . . 67
5.2 blabla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Qualitative evaluation of the discriminative abilities of the employed 3D
point cloud geometry-based descriptors. The rows from top to bottom
show the ground truth, PFH, FPFH, SHOT and 3DSC. The right hand
side shows details. We observe, that each point cloud descriptor produces
a large amount of false positives compared to the ground truth in row 1. 72
5.4 Segmentation results with different local point cloud surface descriptors.
The left side shows the performance of single features, the right side
shows the performance of descriptor combinations. We observe, that the
combinations yield only slight improvements. . . . . . . . . . . . . . . . 73
5.5 A 3D surface reconstruction of a rock surface in which the shape of a
human figure has been engraved. (a) The colored point cloud; (b) The
projected depth map; (c) The deviation map; (d) The enhanced deviation
map accentuates well the different surface topographies of the engraving
and the surrounding rock surface. Please refer to [ZS15] for details on
the process from depth image to enhanced deviation map. . . . . . . . . 76
5.6 Results of the systematic evaluation with RUSBoost on the complete
dataset with 26 3D scans. The enhanced deviation map yields again a
significant improvement of performance. . . . . . . . . . . . . . . . . . . 82
6.1 A part of a petroglyph depicting a human with a superimposed arrow
containing different pecking styles. . . . . . . . . . . . . . . . . . . . . . 85
6.2 A warrior figure containing two visually clearly distinguishable pecking
styles. We observe that the large arrow from left to right has been pecked
over the warrior with clearly larger peck marks. This figure serves as a
good basic test case whether 3D surface descriptors can help to automat-
ically distinguish between pecking styles. . . . . . . . . . . . . . . . . . . 87
6.3 A scene containing at least two clearly distinguishable pecking styles. . . 88
161
LIST OF FIGURES
6.4 Detail of a point cloud with points at peak positions marked red. Please
note, that we refer to peaks with respect to the detection method. The
peaks are actually valleys if we look at the 3D surface as the detector
finds peaks at local maxima in the depth map. . . . . . . . . . . . . . . 90
6.5 Spherical descriptor footprint with a radius of approx. 2mm. We observe
that the peck mark is completely within the footprint. . . . . . . . . . . 90
6.6 A petroglyph with homogeneous pecking style. . . . . . . . . . . . . . . 93
6.7 Visualization of clustering results of densely extracted FPFH descriptors
with r=5mm. The figure in the upper left is the original point cloud, the
other three figures are the results with 2, 3 and 4 clusters respectively. We
observe that the clustering with two clusters is modeling the two visually
observed pecking styles well. The clusterings with 3 and 4 clusters do
not show improvements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.8 Visualization of clustering results of densely extracted FPFH descriptors
with r=2mm. The figure in the upper left is the original point cloud, the
upper right figure shows the result of clustering with two clusters while
the figure at the lower left shows the result with three clusters. The two
details at the lower right of the figure show the central part of the warrior
in the scene. We observe that the superimposition on the figure is clearly
distinguishable in the two and three cluster case. . . . . . . . . . . . . . 96
6.9 Visualization of clustering results of densely extracted FPFH descriptors
with r=2mm. We show results with 2,3,4 and 5 clusters. We observe that
the clustering results in vertical stripes that are assumably not caused by
the surface topology of the homogeneous pecking style of the petroglyph
(see Figure 6.6). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.10 Comparison of different descriptors and footprints. The first row shows
clustering results based on the FPFH descriptor with r=1,2 and 5mm
(from left to right). The second row contains clustering with PFH with
r=1 and 2mm (from left to right). We observe, that this scan requires a
large descriptor footprint as the best result is FPFH with r=5mm. The
original scan is shown in Figure 6.2. . . . . . . . . . . . . . . . . . . . . 98
6.11 A surface detail of a point cloud the points of which have been clustered
into two clusters based on PFH with r=2mm. We observe that the flat
regions in the surface are in one cluster while the deep peck marks are in
the other cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.12 Warrior with surface spheres around peaks serving as basis for topolog-
ical feature extraction. We observe, that the different pecking styles are
modeled well (See Figure 6.2). The third cluster (green) in b) is very
small. Please note, that the circular region colored around a peak has
only a quarter of the radius that is used as query point neighborhood for
feature calculation. We reduced the radius for visual clarity. Overlap-
ping areas of differently colored circles show a mixture of the colors, e.g.
yellow in the case of red and green in Fig b). . . . . . . . . . . . . . . . 99
162
LIST OF FIGURES
6.13 A complex scene with surface spheres around peaks serving as basis for
topological feature extraction. We show results with 2 and 3 clusters. We
observe, that the different pecking styles are modeled only partly (See
Figure 6.3). Please note, that the circular region colored around a peak
has only a quarter of the radius that is used as query point neighborhood
for feature calculation. We reduced the radius for visual clarity. Over-
lapping areas of differently colored circles show a mixture of the colors,
e.g. yellow in the case of red and green in the left figure. . . . . . . . . . 100
7.1 Petroglyphs with the corresponding smoothed figures, skeletons and the
derived graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.2 This figure shows a part of a tracing of a rock in Valcamonica (Coren di
Redondo, Rock 1). c Alberto Marretta, used with permission. . . . . . 106
7.3 This figure shows the petroglyph dataset that we investigate in this chap-
ter. Each column contains examples of one class. We observe, that some
of the classes have high intra-class variance of shape. The petroglyphs
are from various rocks in Valcamonica. c CCSP - Centro Camuno di
Studi Preistorici and Alberto Marretta, used with permission. . . . . . 109
7.4 This figure shows petroglyph tracings (column 1), pre-processed shapes
(column 2), extracted skeletons (column 3) and the derived graphs with a
pruning threshold of 2, 10 and 30px (columns 4-6). We observe, that the
employed skeletonisation algorithm fails to extract details in some cases
(heads in rows 1 and 4), while in other cases small details are covered
well. Furthermore, we observe, that a high pruning threshold removes
skeletal noise (rows 1, 2, 4, 6, 9 and 10), but may also discard relevant
topological information (row 5). . . . . . . . . . . . . . . . . . . . . . . 110
7.5 This figure shows a sample which is misclassified with GED and GHT.
The query image is on the left, and the five nearest neighbors on the right.
The first two rows show the result for GED and the utilized graphs. The
third row shows the result for GHT. . . . . . . . . . . . . . . . . . . . . 116
7.6 This figure shows a sample which is correctly classified with GE and
misclassified with GHT. The query image is on the left, and the five
nearest neighbors on the right. The first two rows show the result for GE
and the utilized graphs. The third row shows the result for GHT. . . . . 117
8.1 A human figure and a house, the pre-processed shapes and skeletonisation
results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.2 Comparison of the original petroglyph tracing (a) and its skeleton (b).
The petroglyph shape pre-processed with the proposed method results in
a significantly simplified skeleton that still preserves the important parts
of the original shape (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.3 Some shapes from example classes (e.g. antropomorph, bird, cross, deer,
etc.) from our entire dataset of more than 1100 shapes classified into two
different archeological typologies. . . . . . . . . . . . . . . . . . . . . . . 124
163
LIST OF FIGURES
8.4 The workflow of the automated shape pre-processing method. We input
a raw binary shape and apply a median filter and binary morphology. . . 127
8.5 Sequence of the automated pre-processing of a shape with the proposed
method. The numbers of foreground and background blobs decrease
rapidly. At iteration 2, a stable minimum of number of blobs is reached
and the stopping criterion aborts the loop at iteration 3. We observe
that further iterations decompose the shape starting at iteration 6. . . . 129
8.6 Comparison of skeletons of the original DCE and SPT algorithms (left
column) with our improved versions (right column). Our improvements
eliminate all shortcomings. . . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.7 Comparison of histograms of the number of endpoints of the thinning
skeletons of unprocessed shapes (left) with the pre-processed shapes (right).
The number of endpoints significantly decreases and is mapped to a rea-
sonable range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.8 Comparison of the results of satisfactory (a) and weak (b) pre-processing
of petroglyph shapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.9 Examples of successful skeletonisation with all selected and improved
algorithms based on shape pre-processing with the proposed method.
Note that for DCE and SPT a few spurious branches remain in contrast
to thinning and BPR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.10 Examples of partly erronous skeletons obtained with the selected algo-
rithms. In all depicted examples spurious branches exist (yellow circles)
and for some shapes important parts are deleted (2nd and 3rd row, red
circles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.1 A query petroglyph (left) and the most similar petroglyphs determined
by our system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.2 Classification accuracy of the classification results of the 26 classes based
on the modified Sansoni typology. The red line shows the overall accu-
racy. Please refer to Table 2.1 for the class descriptions. . . . . . . . . . 142
9.3 Heatmap of the confusion matrix of the classification results of the 26
classes based on the modified Sansoni typology. Each line in the matrix
represents the classifications of a certain class. Please refer to Table 2.1
for the class descriptions. See Table 9.1 for percentage values. . . . . . . 143
9.4 Examples of modified Sansoni typology class 16 - Axe. . . . . . . . . . . 143
9.5 Examples of modified Sansoni typology class 21 - Other unclassified. . . 144
9.6 Classification accuracy (blue) and baseline (black) for each anthropo-
morph subdivision of the 3D-Pitoti project typology. Each subdivision
reflects a different set of classes (based on the divisions) for all anthropo-
morph figures. The baseline is the size of the largest class of a subdivision,
i.e. the performance of a naive classifier which always predicts the class
with the highest a priori probability. . . . . . . . . . . . . . . . . . . . . 145
164
LIST OF FIGURES
9.7 Classification accuracy of the classification results of 18 classes based on
the 3D-Pitoti project typology. The red line shows the overall accuracy.
Please refer to Table 2.2 for the class descriptions of classes 1-16. Classes
17 and 18 are the “body filling” subdivisions of the anthropomorph class. 146
9.8 Heatmap of the confusion matrix of the classification results of 18 classes
based on the 3D-Pitoti project typology. Each line in the matrix repre-
sents the classifications of a certain class. Please refer to Table 2.2 for the
class descriptions of classes 1-16. Classes 17 and 18 are the “body filling”
subdivisions of the anthropomorph class. See Table 9.2 for percentage
values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.9 Examples of 3D-Pitoti project typology class 9 - Shoe/Footprint. . . . . 147
9.10 Examples of 3D-Pitoti project typology class 2 - Camunian rose. . . . . 148
9.11 Performance of single descriptors in 5-fold cross-validated % of accuracy
over number of considered neighbors k. We observe that the best perform-
ing descriptor SC (red / long line-dot) on the small dataset in Subsection
7.3.2 is outperformed by CNN_RT (dark red / long line-short line). . . 150
9.12 Performance of descriptors intermediately fused (see Subsection 7.2.4) in
5-fold cross-validated % of accuracy over number of considered neighbors
k. We observe that the combination SC, IDSC, GHT, GE (red / long
line-dot) which is performing best on the small dataset in Subsection
7.3.2 and evaluated on two typologies in the previous Section 9.2.1 is
clearly outperformed by the combinations that use CNN-based features. 151
9.13 Probabilities (y-axis), that the correct class is within the n (x-axis) first
predictions of a ranked list of predicted classes. The (red / short line) line
shows results for the 3D-Pitoti project typology and the (blue / line-dot)
line shows results for the modified Sansoni typology with the descriptor
combination performing best on the small dataset in Subsection 7.3.2 and
evaluated on two typologies in Subsection 9.2.1. The (green / long line)
shows the results for the 3D-Pitoti project typology with the descriptor
combination including a CNN classifier (SC, IDSC, CNN_RT; CNN_IN)
evaluated in Subsection 9.2.2. . . . . . . . . . . . . . . . . . . . . . . . . 153
10.1 A petroglyph depicting a human figure in Valcamonica, Italy. . . . . . . 155
A.1 Orthophoto of scan ID 1. . . . . . . . . . . . . . . . . . . . . . . . . . . 190
A.2 Orthophoto of scan ID 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 191
A.3 Orthophoto of scan ID 3. . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.4 Orthophoto of scan ID 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 193
A.5 Orthophoto of scan ID 5. . . . . . . . . . . . . . . . . . . . . . . . . . . 194
A.6 Orthophoto of scan ID 6. . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.7 Orthophoto of scan ID 7. . . . . . . . . . . . . . . . . . . . . . . . . . . 196
A.8 Orthophoto of scan ID 8. . . . . . . . . . . . . . . . . . . . . . . . . . . 197
A.9 Orthophoto of scan ID 9. . . . . . . . . . . . . . . . . . . . . . . . . . . 198
A.10 Orthophoto of scan ID 10. . . . . . . . . . . . . . . . . . . . . . . . . . . 199
165
LIST OF FIGURES
A.11 Orthophoto of scan ID 11. . . . . . . . . . . . . . . . . . . . . . . . . . . 200
A.12 Orthophoto of scan ID 12. . . . . . . . . . . . . . . . . . . . . . . . . . . 201
A.13 Orthophoto of scan ID 13. . . . . . . . . . . . . . . . . . . . . . . . . . . 202
A.14 Orthophoto of scan ID 14. . . . . . . . . . . . . . . . . . . . . . . . . . . 203
A.15 Orthophoto of scan ID 15. . . . . . . . . . . . . . . . . . . . . . . . . . . 204
A.16 Orthophoto of scan ID 16. . . . . . . . . . . . . . . . . . . . . . . . . . . 204
A.17 Orthophoto of scan ID 17. . . . . . . . . . . . . . . . . . . . . . . . . . . 205
A.18 Orthophoto of scan ID 18. . . . . . . . . . . . . . . . . . . . . . . . . . . 205
A.19 Orthophoto of scan ID 19. . . . . . . . . . . . . . . . . . . . . . . . . . . 206
A.20 Orthophoto of scan ID 20. . . . . . . . . . . . . . . . . . . . . . . . . . . 207
A.21 Orthophoto of scan ID 21. . . . . . . . . . . . . . . . . . . . . . . . . . . 208
A.22 Orthophoto of scan ID 22. . . . . . . . . . . . . . . . . . . . . . . . . . . 208
A.23 Orthophoto of scan ID 23. . . . . . . . . . . . . . . . . . . . . . . . . . . 209
A.24 Orthophoto of scan ID 24. . . . . . . . . . . . . . . . . . . . . . . . . . . 210
A.25 Orthophoto of scan ID 25. . . . . . . . . . . . . . . . . . . . . . . . . . . 210
A.26 Orthophoto of scan ID 26. . . . . . . . . . . . . . . . . . . . . . . . . . . 210
166
List of Tables
2.1 Classes based on branch endpoints of the modified Sansoni typology. . . 30
2.2 Classes based on branch endpoints of the 3D-Pitoti project typology. . . 31
4.1 Features extracted from each local window. . . . . . . . . . . . . . . . . 55
4.2 Feature Combinations used for early and late fusion. . . . . . . . . . . . 57
4.3 Fused classifiers. Please refer to Table 4.2 for features used in feature
combinations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1 Results of the local 3D descriptors with linear SVM. Numbers provide the
median F1 as well a the inter quartile range (iqr) in brackets. The PFH
descriptor best captures the topographic surface characteristics, however,
at the cost of considerable computation time. . . . . . . . . . . . . . . . 78
5.2 Results of the systematic evaluation of all input representations and fea-
tures with linear SVM on the small dataset (the four point clouds on
which the full 3D approach has been evaluated in Section 5.1). Num-
bers provide the median F1 as well as the inter quartile range (iqr) in
brackets. 1 denotes that the result is significantly better than all other
results for this feature, i.e. best result of a row. 2 denotes that the re-
sult is significantly better than all other results for this representation,
i.e. best result of a column. GHS+SF applied to the EDM significantly
outperform all other features in the experiments. . . . . . . . . . . . . . 79
5.3 Results of the systematic evaluation with RUSBoost (same schema as
Table 5.2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.1 Topology based features extracted from the skeletal graphs. Each feature
is a description of the whole graph. Please refer to [BdW12] for more details.112
7.2 Classification accuracy of GED in percent. We use LOOCV-validated 5-
NN classification. The maximum number of open paths for the A* beam
approximation is 6000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.3 Classification accuracy (in percent) of GE utilizing single scalar topolog-
ical features. We use LOOCV-validated 5-NN classification. p denotes
the size of the pruning threshold. The three best values are emphasized.
Please refer to Table 7.1 for the descriptions corresponding to the feature
IDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
167
LIST OF TABLES
7.4 Classification accuracy (in percent) of GE using combinations of 1 to 10
single topology features that perform best for each dimension. We use
LOOCV-validated 5-NN classification with brute-force feature selection.
The pruning threshold is 10px. The best values are emphasized. Please
refer to Table 7.1 for the descriptions corresponding to the feature IDs. . 115
7.5 Classification accuracy (in percent) of GE, SC, IDSC and GHT and com-
binations thereof. We use LOOCV-validated 5-NN classification for single
shape features and the classifier fusion method described in Section 7.2.4. 115
8.1 Comparison of recent skeletonisation methods with respect to the identi-
fied criteria. Note that for point-based pruning approaches the first two
criteria do not apply because they do not distinguish between significant
and insignificant branches. . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.2 Results of the quantitative evaluation of the proposed pre-processing
method on the entire dataset. . . . . . . . . . . . . . . . . . . . . . . . . 133
8.3 Evaluation of the four selected skeletonisation algorithms on the entire
dataset. BPR and thinning perform best on the pre-processed shapes. . 137
9.1 Confusion matrix of the classification results of the 26 classes based on the
modified Sansoni typology. Each line in the matrix states the percentage
of the classifications of a certain class. Please refer to Table 2.1 for the
class descriptions. The sum of the values of each line may differ from 100
due to the rounding of every single percentage value. . . . . . . . . . . . 144
9.2 Confusion matrix of the classification results of 18 classes based on the
3D-Pitoti project typology. Each line in the matrix states the percentage
of the classifications of a certain class. Please refer to Table 2.2 for the
class descriptions of classes 1-16. Classes 17 and 18 are the “body filling”
subdivisions of the anthropomorph class. The sum of the values of each
line may differ from 100 due to the rounding of every single percentage
value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.1 Summary of the reconstructed surfaces in the dataset. For each surface
we provide an ID, description, its location (in terms of site, rock, and
area) and the reconstruction method. . . . . . . . . . . . . . . . . . . . . 188
A.2 Overview of basic measures of the digitized surfaces: the covered area (in
pixels at 300dpi and in cm2), the number of 3D points in the point cloud,
the percentage of pecked regions, the number of disconnected pecked
regions and the range of depth values. . . . . . . . . . . . . . . . . . . . 189
168
References
[AB96] Carlo Arcelli and Gabriella Sanniti di Baja. Skeletons of planar pat-
terns. In T. Yung Kong and Azriel Rosenfeld, editors, Topological
Algorithms for Digital Image Processing, volume 19 of Machine In-
telligence and Pattern Recognition, pages 99 – 143. North-Holland,
1996.
[ABR+14] M. Aubert, A. Brumm, M. Ramli, T. Sutikna, E. W. Saptomo,
B. Hakim, M. J. Morwood, G. D. van den Bergh, L. Kinsley, and
A. Dosseto. Pleistocene cave art from Sulawesi, Indonesia. Nature,
514(7521):223–227, October 2014.
[AC94] Emmanuel. Anati and Tiziana. Cittadini. Valcamonica rock art: A
new history for Europe. Edizioni del Centro, Capo di Ponte, Italy,
1994.
[AEET08] Cagri Aslan, Aykut Erdem, Erkut Erdem, and Sibel Tari. Discon-
nected skeleton: shape at its absolute scale. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 30(12):2188–2203, 2008.
[AH95] H. Anys and Dong-Chen He. Evaluation of textural and multipolar-
ization radar features for crop classification. IEEE Transactions on
Geoscience and Remote Sensing, 33(5):1170 –1181, sep 1995.
[Ana76] Emmanuel Anati. Evolution and style in Camunian rock art: An in-
quiry into the formation of European civilization. Edizioni del Centro,
Capo di Ponte, Italy, 1976.
[APR15] Craig Alexander, Axel Pinz, and Christian Reinbacher. Multi-scale 3d
rock-art recording. Digital Applications in Archaeology and Cultural
Heritage, 2(2–3):181 – 195, 2015.
[ASM96] ASME. ASME B46.1-1995 Surface texture (surface roughness, wavi-
ness and lay): An American National Standard. The Anmerican So-
ciety of Mechanical Engineers (ASME), New York, USA, 1996.
[Baj06] Gabriella Sanniti di Baja. Skeletonization of digital objects. In Pro-
ceedings of the 11th Iberoamerican conference on Progress in Pattern
169
REFERENCES
Recognition, Image Analysis and Applications (CIARP06), pages 1–
13. Springer DE, 2006.
[BdW12] Gergana Bounova and Olivier de Weck. Overview of metrics and their
correlation patterns for multiple-metric topology analysis on hetero-
geneous graph ensembles. Phys. Rev. E, 85:016117, Jan 2012.
[BET09] Emre Baseski, Aykut Erdem, and Sibel Tari. Dissimilarity between
two skeletal trees in a context. Pattern Recognition, 42(3):370–385,
2009.
[BHH78] George E.P. Box, William Gordon Hunter, and J. Stuart Hunter.
Statistics for experimenters. John Wiley and sons New York, 1978.
[BJ03] Liam Blunt and Xiang Jiang. Advanced techniques for assessment sur-
face topography: development of a basis for 3D surface texture stan-
dards" surfstand". Elsevier, 2003.
[BL07] Xiang Bai and Longin Jan Latecki. Discrete skeleton evolution. In
Proceedings of the 6th Int. Conf. on Energy Minimization Methods
in Computer Vision and Pattern Recognition, pages 362–374, Berlin,
Heidelberg, 2007. Springer-Verlag.
[BL08] Xiang Bai and L.J. Latecki. Path similarity skeleton graph match-
ing. IEEE Transactions on Pattern Analysis and Machine Intelligence,
30(7):1282–1292, July 2008.
[BLL07] Xiang Bai, Longin Latecki, and Wen-yu Liu. Skeleton pruning by
contour partitioning with discrete curve evolution. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 29(3):449–462, March
2007. 00228.
[BLT09] Xiang Bai, Wenyu Liu, and Zhuowen Tu. Integrating contour and
skeleton for shape classification. In 2009 IEEE 12th International
Conference on Computer Vision Workshops (ICCV Workshops), page
360–367. IEEE, 2009.
[Blu67] H. Blum. A transformation for extracting new descriptors of shape. In
Models for the Perception of Speech and Visual Form, Proceedings of
Meeting held in Boston, Nov. 1964, pages 362–380, Cambridge, 1967.
MIT Press.
[BMdA12] J.A. Barcelo and V. Moitinho de Almeida. Functional analysis from
visual and non-visual data. an artificial intelligence approach. Mediter-
ranean Archaeology and Archaeometry, 12(2):273–321, 2012.
170
REFERENCES
[BMP02] Serge Belongie, Jitendra Malik, and Jan Puzicha. Shape matching
and object recognition using shape contexts. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 24(4):509–522, 2002.
[BN78] H. Blum and Roger N. Nagel. Shape description using weighted sym-
metric axis features. Pattern Recognition, 10(3):167–180, 1978.
[Cla02] David A Clausi. An analysis of co-occurrence texture statistics as
a function of grey level quantization. Canadian Journal of remote
sensing, 28(1):45–62, 2002.
[CV95] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Ma-
chine learning, 20(3):273–297, 1995.
[DABRR06] Margarita Díaz-Andreu, Christopher Brooke, Michael Rainsbury, and
Nick Rosser. The spiral that vanished: the application of non-contact
recording techniques to an elusive rock art motif at Castlerigg stone
circle in Cumbria. Journal of Archaeological Science, 33(11):1580–
1587, November 2006. 00034.
[dARBP13] V. M. de Almeida, R. Rosillo, J. A. Barcel’o, and A. Palomo. Linking
3d digital surface texture data with ancient manufacturing procedures.
In Digital Heritage International Congress (DigitalHeritage), 2013,
volume 1, pages 735–738, Oct 2013.
[Din55] G. P. Dinneen. Programming pattern recognition. In Proceedings of
the March 1-3, 1955, Western Joint Comp. Conf., AFIPS ’55 (West-
ern), pages 94–100, NY, USA, 1955. ACM.
[DJLW08] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. Image
retrieval: Ideas, influences, and trends of the new age. ACM Comput.
Surv., 40(2):5:1–5:60, May 2008.
[DK12] Tal Darom and Yosi Keller. Scale-invariant features for 3-d mesh
models. IEEE Transactions on Image Processing, 21(5):2758–2769,
2012.
[DP13] Vincenzo Deufemia and Luca Paolino. Combining unsupervised clus-
tering with a non-linear deformation model for efficient petroglyph
recognition. In George Bebis, Richard Boyle, Bahram Parvin, Darko
Koracin, Baoxin Li, Fatih Porikli, Victor Zordan, James Klosowski,
Sabine Coquillart, Xun Luo, Min Chen, and David Gotz, editors, Ad-
vances in Visual Computing, number 8034 in Lecture Notes in Com-
puter Science, pages 128–137. Springer Berlin Heidelberg, January
2013.
171
REFERENCES
[DP14] Vincenzo Deufemia and Luca Paolino. Segmentation and recognition
of petroglyphs using generic fourier descriptors. In Abderrahim El-
moataz, Olivier Lezoray, Fathallah Nouboud, and Driss Mammass,
editors, Image and Signal Processing, volume 8509 of Lecture Notes in
Computer Science, pages 487–494. Springer International Publishing,
2014.
[DPdL12] V. Deufemia, L. Paolino, and H. de Lumley. Petroglyph recognition
using self-organizing maps and fuzzy visual language parsing. In 2012
IEEE 24th International Conference on Tools with Artificial Intelli-
gence (ICTAI), volume 1, pages 852–859, 2012.
[DR04] Cecilia Di Ruberto. Recognition of shapes by attributed skeletal
graphs. Pattern Recognition, 37(1):21–31, 2004.
[DSS92] WP Dong, PJ Sullivan, and KJ Stout. Comprehensive study of pa-
rameters for characterizing three-dimensional surface topography i:
Some inherent properties of parameter variation. Wear, 159(2):161–
171, 1992.
[DSS93] WP Dong, PJ Sullivan, and KJ Stout. Comprehensive study of pa-
rameters for characterizing three-dimensional surface topography ii:
Statistical properties of parameter variation. Wear, 167(1):9–21, 1993.
[DSS94a] WP Dong, PJ Sullivan, and KJ Stout. Comprehensive study of pa-
rameters for characterising three-dimensional surface topography: Iii:
Parameters for characterising amplitude and some functional proper-
ties. Wear, 178(1):29–43, 1994.
[DSS94b] WP Dong, PJ Sullivan, and KJ Stout. Comprehensive study of pa-
rameters for characterising three-dimensional surface topography: Iv:
parameters for characterising spatial and hybrid properties. Wear,
178(1):45–60, 1994.
[DT05] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for
human detection. In Computer Vision and Pattern Recognition, 2005.
CVPR 2005. IEEE Computer Society Conference on, volume 1, pages
886–893. IEEE, 2005.
[DvGNK99] Kristin J Dana, Bram van Ginneken, Shree K Nayar, and Jan J Koen-
derink. Reflectance and texture of real-world surfaces. ACM Trans.
Graph., 18(1):1–34, January 1999.
[DvLV08] M. Fatih Demirci, Reinier H. van Leuken, and Remco C. Veltkamp.
Indexing through Laplacian spectra. Computer Vision and Image Un-
derstanding, 110(3):312–325, June 2008.
172
REFERENCES
[ET10] Aykut Erdem and Sibel Tari. A similarity-based approach for shape
classification using Aslan skeletons. Pattern Recognition Letters,
31(13):2024–2032, 2010.
[FHK+04] Andrea Frome, Daniel Huber, Ravi Kolluri, Thomas Bülow, and Jiten-
dra Malik. Recognizing objects in range data using regional point de-
scriptors. In Computer Vision-ECCV 2004, pages 224–237. Springer,
2004. 00401.
[Fis35] Ronald Aylmer Fisher. The design of experiments. Oliver & Boyd,
1935.
[FS+96] Yoav Freund, Robert E Schapire, et al. Experiments with a new
boosting algorithm. In In Proceedings of the International Conference
on Machine Learning, volume 96, pages 148–156, 1996.
[GAMNRGM11] Diego Gonzalez-Aguilera, Angel Muñoz-Nieto, Pablo Rodriguez-
Gonzalvez, and Mario Menéndez. New tools for rock art modelling:
automated sensor integration in Pindal Cave. Journal of Archaeolog-
ical Science, 38(1):120–128, January 2011. 00013.
[GVB12] Jaume Gibert, Ernest Valveny, and Horst Bunke. Graph embed-
ding in vector spaces by node attribute statistics. Pattern Recogn.,
45(9):3072–3083, September 2012.
[HD84] S.B. Ho and C.R. Dyer. Medial-axis based shape smoothing. Techni-
cal Report 557, University of Wisconsin-Madison Department of Com-
puter Sciences, 1984.
[How04] Nicholas R. Howe. Code implementations by Nicholas R. Howe, 2004.
[HR08] T. Hengl and H.I. Reuter, editors. Geomorphometry: Concepts, Soft-
ware, Applications, volume 33. Elsevier, Amsterdam, 2008.
[HSD73] Robert M. Haralick, K. Shanmugam, and Its’Hak Dinstein. Textural
features for image classification. IEEE Transactions on Systems, Man,
and Cybernetics, 3(6):610–621, November 1973.
[II02] ISO-IEC. Information Technology - Multimedia Content Description
Interface. 15938. ISO/IEC, Moving Pictures Expert Group, 1st edi-
tion, 2002.
[JH99] Andrew E. Johnson and Martial Hebert. Using spin images for effi-
cient object recognition in cluttered 3d scenes. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 21(5):433–449, 1999.
173
REFERENCES
[JSD+14] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev,
Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Dar-
rell. Caffe: Convolutional architecture for fast feature embedding.
arXiv preprint arXiv:1408.5093, 2014.
[KAH+13] Walter G. Kropatsch, Nicole M. Artner, Yll Haxhimusa, Xiaoyi Jiang,
David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg,
Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz,
C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Ter-
zopoulos, Doug Tygar, Moshe Y. Vardi, and Gerhard Weikum, ed-
itors. Graph-Based Representations in Pattern Recognition, volume
7877 of Lecture Notes in Computer Science. Springer Berlin Heidel-
berg, Berlin, Heidelberg, 2013.
[KC09] Stelios Krinidis and V. Chatzis. A skeleton family generator via
physics-based deformable models. IEEE Transactions on Image Pro-
cessing, 18(1):1–11, 2009.
[KCRU58] R. A. Kirsch, L. Cahn, C. Ray, and G. H. Urban. Experiments in
processing pictorial information with a digital computer. In Papers
and Discussions Presented at the December 9-13, 1957, Eastern Joint
Comp. Conf.: Computers with Deadlines to Meet, IRE-ACM-AIEE
’57 (Eastern), pages 221–229, New York, USA, 1958. ACM.
[Kir79] David Kirkpatrick. Efficient computation of continuous skeletons. In
Proceedings of the 20th Annual Symposium on Foundations of Com-
puter Science, 1979., pages 18–27, San Juan, Puerto Rico, 1979. IEEE.
[KJPK02] Kwang In Kim, Keechul Jung, Se Hyun Park, and Hang Joon Kim.
Support vector machines for texture classification. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 24:1542–1550, 2002.
[KK13] Stelios Krinidis and Michail Krinidis. Empirical mode decomposition
on skeletonization pruning. Image and Vision Computing, 31(8):533
– 541, 2013.
[KOW08] Hee-Kooi Khoo, Hong-Choon Ong, and Ya-Ping Wong. Image tex-
ture classification using combined grey level Co-Occurrence probabil-
ities and support vector machines. In Proceedings of the 2008 Fifth
International Conference on Computer Graphics, Imaging and Visual-
isation, page 180–184, Washington, DC, USA, 2008. IEEE Computer
Society.
[KS08] Karthik Krish and Wesley Snyder. A new accumulator-based approach
to shape recognition. In Proceedings of the 4th International Sym-
posium on Advances in Visual Computing, Part II, ISVC ’08, page
157–169, Berlin, Heidelberg, 2008. Springer-Verlag.
174
REFERENCES
[KSK01] Philip N. Klein, Thomas B. Sebastian, and Benjamin B. Kimia. Shape
matching using edit-distance: An implementation. In Proceedings of
the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms,
SODA ’01, pages 781–790, Philadelphia, PA, USA, 2001. Society for
Industrial and Applied Mathematics.
[Kun04] Ludmila I. Kuncheva. Combining Pattern Classifiers: Methods and
Algorithms. Wiley-Interscience, 2004.
[Kva13] Kenneth Kvamme. An examination of automated archaeological fea-
ture recognition in remotely sensed imagery. Computational Ap-
proaches to Archaeological Spaces, 60:53, 2013.
[Lea11] Richard Leach. Optical measurement of surface topography. Springer,
2011.
[LJ07] Haibin Ling and David W Jacobs. Shape classification using the inner-
distance. IEEE Transactions on Pattern Analysis and Machine Intel-
ligence, 29(2):286–299, 2007.
[LL99] Longin Jan Latecki and Rolf Lakämper. Polygon evolution by vertex
deletion. In Proceedings of the Second International Conference on
Scale-Space Theories in Computer Vision, pages 398–409. Springer,
1999.
[LL00] Longin Jan Latecki and Rolf Lakamper. Shape similarity measure
based on correspondence of visual parts. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 22(10):1185–1190, 2000.
[LLE00] Longin Jan Latecki, Rolf Lakamper, and T. Eckhardt. Shape descrip-
tors for non-rigid shapes with a single closed contour. In Computer
Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference
on, volume 1, page 424–429. IEEE, 2000.
[LM99] Thomas Leung and Jitendra Malik. Recognizing surfaces using Three-
Dimensional textons. In Proceedings of the International Conference
on Computer Vision-Volume 2, ICCV ’99, page 1010–1025, Washing-
ton, DC, USA, 1999. IEEE Computer Society.
[LNCV10] José Luis Lerma, Santiago Navarro, Miriam Cabrelles, and Valentín
Villaverde. Terrestrial laser scanning and close range photogrammetry
for 3d archaeological documentation: the Upper Palaeolithic Cave of
Parpalló as a case study. Journal of Archaeological Science, 37(3):499–
507, March 2010. 00090.
[Lon98] Sven Loncaric. A survey of shape analysis techniques. Pattern recog-
nition, 31(8):983–1001, 1998.
175
REFERENCES
[Low04] David G. Lowe. Distinctive image features from scale-invariant key-
points. International Journal of Computer Vision, 60(2):91–110,
November 2004.
[LPC+00] Marc Levoy, Kari Pulli, Brian Curless, Szymon Rusinkiewicz, David
Koller, Lucas Pereira, Matt Ginzton, Sean Anderson, James Davis,
Jeremy Ginsberg, and others. The digital Michelangelo project: 3d
scanning of large statues. In Proceedings of the 27th annual confer-
ence on Computer graphics and interactive techniques, pages 131–144.
ACM Press/Addison-Wesley Publishing Co., 2000. 01775.
[LS06] George V. Landon and W. Brent. Seales. Petroglyph digitization: en-
abling cultural heritage scholarship. Machine Vision and Applications,
17(6):361–371, October 2006.
[LSYZ11] Geng Li, Murat Semerci, Bülent Yener, and Mohammed J. Zaki.
Graph classification via topological and label attributes. In 9th Work-
shop on Mining and Learning with Graphs (with SIGKDD)(August
2011), 2011. 00009.
[LWH+12] Hongzhi Liu, Zhonghai Wu, D. Frank Hsu, Bradley S. Peterson, and
Dongrong Xu. On the generation and pruning of skeletons using gen-
eralized voronoi diagrams. Pattern Recognition Letters, 33(16):2113–
2119, December 2012.
[LWZH13] Hongzhi Liu, Zhong-Hai Wu, Xing Zhang, and D. Frank Hsu. A skele-
ton pruning algorithm based on information fusion. Pattern Recogni-
tion Letters, 34(10):1138 – 1145, 2013.
[MAK+97] Farzin Mokhtarian, Sadegh Abbasi, Josef Kittler, et al. Efficient and
robust retrieval by shape content through curvature scale space. Series
on Software Engineering and Knowledge Engineering, 8:51–58, 1997.
[Mar11] Alberto Marretta. Valcamonica Rock Art: An Extraordinary Archae-
ological Source. In Alberto Marretta and Tiziana Cittadini, editors,
Valcamonica rock-art parks: guide to visiting routes, pages 8–29. Edi-
zione del Centro, Capo di Ponte, 1 edition, 2011.
[MCH10] Fei Mai, C. Q. Chang, and Y. S. Hung. Affine-invariant shape match-
ing and recognition under partial occlusion. In 17th IEEE Interna-
tional Conference on Image Processing (ICIP), page 4605–4608. IEEE,
2010.
[MM86] Farzin Mokhtarian and Alan Mackworth. Scale-based description and
recognition of planar curves and two-dimensional shapes. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 8(1):34–43,
1986.
176
REFERENCES
[Mok95] Farzin Mokhtarian. Silhouette-based isolated object recognition
through curvature scale space. IEEE Transactions on Pattern Analy-
sis and Machine Intelligence, 17(5):539–544, 1995.
[Mon68] U. Montanari. A method for obtaining skeletons using a quasi-
euclidean distance. Journal of the ACM (JACM), 15(4):600–624, Oc-
tober 1968.
[Mon69] Ugo Montanari. Continuous skeletons from digitized images. Journal
of the ACM (JACM), 16(4):534–549, October 1969.
[OI92] Robert L. Ogniewicz and M. Ilg. Voronoi skeletons: Theory and ap-
plications. In Proceedings of the IEEE Conf. on Comp. Vision and
Patt. Rec. (CVPR), pages 63–69, 1992.
[OPH96] Timo Ojala, Matti Pietikäinen, and David Harwood. A comparative
study of texture measures with classification based on featured distri-
butions. Pattern recognition, 29(1):51–59, 1996.
[OPM02] Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution
gray-scale and rotation invariant texture classification with local bi-
nary patterns. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 24(7):971–987, 2002.
[Par11] J. R Parker. Algorithms for image processing and computer vision.
Wiley Publishing, Inc, Indianapolis, USA, 2nd edition edition, 2011.
[Pav78] Theodosios Pavlidis. A review of algorithms for shape analysis. Com-
puter graphics and image processing, 7(2):243–258, 1978.
[PCGV02] Mari Partio, Bogdan Cramariuc, Moncef Gabbouj, and Ari Visa. Rock
texture retrieval using gray level Co-Occurence matrix. In Proceedings
of the 5th Nordic Signal Processing Symposium, Oslo, 2002.
[PDM02] Euripides G. M. Petrakis, Aristeidis Diplaros, and Evangelos Milios.
Matching and retrieval of distorted and occluded shapes using dy-
namic programming. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 24(11):1501–1516, 2002.
[PHGD+12] A. W. G. Pike, D. L. Hoffmann, M. Garcia-Diez, P. B. Pettitt,
J. Alcolea, R. De Balbin, C. Gonzalez-Sainz, C. de las Heras, J. A.
Lasheras, R. Montes, and J. Zilhao. U-Series Dating of Paleolithic Art
in 11 Caves in Spain. Science, 336(6087):1409–1413, June 2012.
[Rao05] C Rao. Linear model selection by cross-validation. Journal of Statis-
tical Planning and Inference, 128(1):231–240, 2005.
177
REFERENCES
[RBB09] Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. Fast point
feature histograms (FPFH) for 3d registration. In Proceedings of
IEEE International Conference on Robotics and Automation, 2009.
ICRA’09., pages 3212–3217. IEEE, 2009. 00320.
[RC11] Radu Bogdan Rusu and Steve Cousins. 3D is here: Point Cloud
Library (PCL). In IEEE International Conference on Robotics and
Automation (ICRA), Shanghai, China, May 9-13 2011.
[REB13] Kaspar Riesen, Sandro Emmenegger, and Horst Bunke. A novel soft-
ware toolkit for graph edit distance computation. In Graph-Based Rep-
resentations in Pattern Recognition, pages 142–151. Springer, 2013.
[RMBB08] Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, and Michael
Beetz. Persistent point feature histograms for 3d point clouds. In
Proc 10th Int Conf Intel Autonomous Syst (IAS-10), Baden-Baden,
Germany, pages 119–128, 2008. 00035.
[RRPOGP11] Edgar Roman-Rangel, Carlos Pallan, Jean-Marc Odobez, and Daniel
Gatica-Perez. Analyzing ancient maya glyph collections with contex-
tual shape descriptors. International Journal of Computer Vision,
94(1):101–117, 2011.
[SAC07] Mark D Smucker, James Allan, and Ben Carterette. A comparison
of statistical significance tests for information retrieval evaluation. In
Proceedings of the sixteenth ACM conference on Conference on infor-
mation and knowledge management, pages 623–632. ACM, 2007.
[SB98] Doron Shaked and Alfred M. Bruckstein. Pruning medial axes. Com-
puter Vision Image Understanding, 69(2):156–169, February 1998.
[SB11] Markus Seidl and Christian Breiteneder. Detection and classification
of petroglyphs in gigapixel images – preliminary results. VAST11:
The 12th International Symposium on Virtual Reality, Archaeology
and Intelligent Cultural Heritage - Short Papers, pages 45–48, 2011.
[SB12] Markus Seidl and Christian Breiteneder. Automated petroglyph image
segmentation with interactive classifier fusion. In Proceedings of the
Eighth Indian Conference on Computer Vision, Graphics and Image
Processing (ICVGIP ’12), 2012.
[SBH+11] Wei Shen, Xiang Bai, Rong Hu, Hongyuan Wang, and Longin Latecki.
Skeleton growing and pruning with bending potential ratio. Pattern
Recognition, 44(2):196–209, February 2011.
178
REFERENCES
[SBYL13] Wei Shen, Xiang Bai, XingWei Yang, and Longin Jan Latecki. Skele-
ton pruning as trade-off between skeleton simplicity and reconstruc-
tion error. Science China Information Sciences, 56(4):1–14, January
2013.
[SCC+11] Roberto Scopigno, Marco Callieri, Paolo Cignoni, Massimiliano
Corsini, Matteo Dellepiane, Federico Ponchio, and Guido Ranzuglia.
3d models for cultural heritage: Beyond plain visualization. IEEE
Computer, 44(7):48–55, 2011.
[Sco12] Roberto Scopigno. Sampled 3d models for cultural heritage: which
uses beyond visualization? Virtual Archaeology Review, 3(5):109–115,
2012.
[SDR+10] Juan Ortiz Sanz, Maria de la Luz Gil Docampo, Santiago Martínez
Rodríguez, María Teresa Rego Sanmartín, and Gonzalo Meijide Came-
selle. A simple methodology for recording petroglyphs using low-cost
digital image correlation photogrammetry and consumer-grade digital
cameras. Journal of Archaeological Science, 37(12):3158–3169, De-
cember 2010. 00026.
[SG09] U. Sansoni and S. Gavaldo. Lucus rupestris: sei millenni d’arte ru-
pestre a Campanine di Cimbergo. Edizioni del Centro, Capo di Ponte
(BS), Italy, 2009.
[Sil09] Roland Silva. Rock Art Sites on the UNESCO World Heritage List.
ICOMOS, International Council on Monuments and Sites, Paris, 2009.
[SK96] K. Siddiqi and B.B. Kimia. A shock grammar for recognition. In
Proceedings of 1996 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition. CVPR ’96,, pages 507–513, 1996.
[SKVHN08] Chris Seiffert, Taghi M Khoshgoftaar, Jason Van Hulse, and Amri
Napolitano. Rusboost: Improving classification performance when
training data is skewed. In 19th International Conference on Pattern
Recognition, 2008. ICPR 2008, pages 1–4. IEEE, 2008.
[SKVHN10] Chris Seiffert, Taghi M Khoshgoftaar, Jason Van Hulse, and Amri
Napolitano. Rusboost: A hybrid approach to alleviating class imbal-
ance. IEEE Transactions on Systems, Man and Cybernetics, Part A:
Systems and Humans, 40(1):185–197, 2010.
[SRKB11] Bastian Steder, Radu Bogdan Rusu, Kurt Konolige, and Wolfram Bur-
gard. Point feature extraction on 3d range scans taking into account
object boundaries. In Proceedings of 2011 IEEE International Con-
ference on Robotics and Automation (ICRA), pages 2601–2608. IEEE,
2011.
179
REFERENCES
[SS05] Kang B Sun and Boaz J Super. Classification of contour shapes us-
ing class segment sets. In Computer Vision and Pattern Recognition,
2005. CVPR 2005. IEEE Computer Society Conference on, volume 2,
page 727–733. IEEE, 2005.
[SSDZ99] Kaleem Siddiqi, Ali Shokoufandeh, SvenJ. Dickinson, and StevenW.
Zucker. Shock graphs and shape matching. International Journal of
Computer Vision, 35(1):13–32, 1999.
[ST99] L-K Soh and Costas Tsatsoulis. Texture analysis of sar sea ice im-
agery using gray level co-occurrence matrices. IEEE Transactions on
Geoscience and Remote Sensing, 37(2):780–795, 1999.
[STDS14] Samuele Salti, Federico Tombari, and Luigi Di Stefano. SHOT: Unique
signatures of histograms for surface and texture description. Computer
Vision and Image Understanding, 125:251–264, August 2014.
[SWA15] Markus Seidl, Ewald Wieser, and Craig Alexander. Automated classi-
fication of petroglyphs. Digital Applications in Archaeology and Cul-
tural Heritage, 2(2,3):196 – 212, 2015.
[SWZ+15] Markus Seidl, Ewald Wieser, Matthias Zeppelzauer, Axel Pinz, and
Christian Breiteneder. Graph-Based Shape Similarity of Petroglyphs.
In Lourdes Agapito, Michael M. Bronstein, and Carsten Rother, ed-
itors, Computer Vision - ECCV 2014 Workshops, volume 8925 of
Lecture Notes in Computer Science, pages 133–148. Springer Inter-
national Publishing, 2015.
[Ted00] Laura Anne Tedesco. Lascaux (ca. 15,000 B.C.). In Heilbrunn Time-
line of Art History. The Metropolitan Museum of Art, New York,
October 2000.
[Tel12] Alexandru Telea. Feature preserving smoothing of shapes using
saliency skeletons. In Vis. in Medicine and Life Sciences II, pages
153–170. Springer, 2012.
[TJ98] Mihran Tuceryan and Anil K Jain. Texture analysis. The handbook
of pattern recognition and computer vision, 2:207–248, 1998.
[TLS09] Øivind Due Trier, Siri Øyen Larsen, and Rune Solberg. Automatic
detection of circular structures in high-resolution satellite images of
agricultural land. Archaeological Prospection, 16(1):1–15, 2009.
[TSDS10] Federico Tombari, Samuele Salti, and Luigi Di Stefano. Unique sig-
natures of histograms for local surface description. In Computer Vi-
sion–ECCV 2010, pages 356–369. Springer, 2010. 00160.
180
REFERENCES
[TTMI06] R Takaki, J Toriwaki, S Mizuno, and R Izuhara. Shape analysis of
petroglyphs in central asia. Forma, 21:243–258, 2006.
[TvW02] Alexandru Telea and Jarke J. van Wijk. An augmented fast march-
ing method for computing skeletons and centerlines. In Proc. of the
Symposium on Data Visualisation 2002, VISSYM ’02, pages 251–ff,
Aire-la-Ville, Switzerland, 2002. Eurographics Ass.
[VF08] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library
of computer vision algorithms, 2008.
[Vin94] Luc Vincent. Morphological area openings and closings for grey-scale
images. In Shape in Picture, pages 197–208. Springer, 1994.
[VZ03] M. Varma and A. Zisserman. Texture classification: are filter banks
necessary? In 2003 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, 2003. Proceedings., pages II–691–8,
Madison, WI, USA, 2003.
[VZ05] Manik Varma and Andrew Zisserman. A statistical approach to tex-
ture classification from single images. International Journal of Com-
puter Vision, 62:61–81, April 2005.
[WFFD11] Marcus Wallenberg, Michael Felsberg, Per-Erik Forssén, and Babette
Dellen. Channel coding for joint colour and depth segmentation. Pat-
tern Recognition, 6835:306–315, 2011.
[WHH03] Eric Wahl, Ulrich Hillenbrand, and Gerd Hirzinger. Surflet-pair-
relation histograms: a statistical 3d-shape representation for rapid
classification. In Proceedings of Fourth International Conference on
3-D Digital Imaging and Modeling, 2003. 3DIM 2003., pages 474–481.
IEEE, 2003. 00126.
[Whi05] David S Whitley. Introduction to rock art research. Left Coast Press
Walnut Creek, CA, 2005.
[WSZ16] Ewald Wieser, Markus Seidl, and Matthias Zeppelzauer. A study on
skeletonization of complex petroglyph shapes. Multimedia Tools and
Applications, pages 1–19, 2016.
[XWLB10] Yao Xu, Bo Wang, Wenyu Liu, and Xiang Bai. Skeleton graph match-
ing based on critical points using path similarity. In Computer Vi-
sion–ACCV 2009, page 456–465. Springer, 2010.
[YBYZ09] Xiaojun Yang, Xiang Bai, Xingwei Yang, and Luan Zeng. An effi-
cient quick algorithm for computing stable skeletons. In Proceedings
of the 2nd International Congress on Image and Signal Processing
(CISP’09), pages 1–5, October 2009.
181
REFERENCES
[YKR08] Mingqiang Yang, Kidiyo Kpalma, and Joseph Ronsin. A survey of
shape feature extraction techniques. Pattern recognition, page 43–90,
2008. 00163.
[YLH+09] Xu-Cheng Yin, Qian Liu, Hong-Wei Hao, Zhi-Bin Wang, and Kaizhu
Huang. A rock structure recognition system using FMI images. In Pro-
ceedings of the 16th International Conference on Neural Information
Processing: Part I, ICONIP ’09, page 838–845, Berlin, Heidelberg,
2009. Springer-Verlag.
[ZBVH09] Andrei Zaharescu, Edmond Boyer, Kiran Varanasi, and Radu Ho-
raud. Surface feature detection and description with applications to
mesh matching. In IEEE Conference on Computer Vision and Pattern
Recognition, 2009. CVPR 2009., pages 373–380. IEEE, 2009.
[ZL02] Dengsheng Zhang and Guojun Lu. Generic fourier descriptor for
shape-based image retrieval. In Proceedings of IEEE International
Conference on Multimedia and Expo, 2002. ICME ’02., volume 1,
pages 425–428 vol.1, 2002.
[ZL04] Dengsheng Zhang and Guojun Lu. Review of shape representation
and description techniques. Pattern Recognition, 37(1):1–19, January
2004.
[ZMLS07] Jianguo Zhang, Marcin Marszałek, Svetlana Lazebnik, and Cordelia
Schmid. Local features and kernels for classification of texture and
object categories: A comprehensive study. International journal of
computer vision, 73(2):213–238, 2007.
[ZPS+15] Matthias Zeppelzauer, Georg Poier, Markus Seidl, Christian Rein-
bacher, Christian Breiteneder, Horst Bischof, and Samuel Schulter.
Interactive segmentation of rock-art in high-resolution 3d reconstruc-
tions. In 2015 Digital Heritage, volume 2, pages 37–44, Sept 2015.
[ZPS+ss] Matthias Zeppelzauer, Georg Poier, Markus Seidl, Christian Rein-
bacher, Samuel Schulter, Christian Breiteneder, and Horst Bischof.
Interactive 3d segmentation of rock-art by enhanced depth maps and
gradient preserving regularization. ACM Journal on Computing and
Cultural Heritage, In Press.
[ZS15] Matthias Zeppelzauer and Markus Seidl. Efficient image-space extrac-
tion and representation of 3d surface topography. In IEEE Interna-
tional Conference on Image Processing (ICIP), pages 2845–2849, Sept
2015.
[ZTKP14] L.V. Zotkina, A.S. Tekhterekov, V.M. Kharevich, and H. Plisson.
An Experimental Study of Percussion Technologies in the Minusinsk
182
REFERENCES
Basin: Percussion and Tool Types1. Archaeology, Ethnology and An-
thropology of Eurasia, 42(1):55–65, March 2014.
[ZTW+09] Zhiping Zeng, Anthony KH Tung, Jianyong Wang, Jianhua Feng, and
Lizhu Zhou. Comparing stars: On approximating graph edit distance.
Proceedings of the VLDB Endowment, 2(1):25–36, 2009.
[ZWKL09] Qiang Zhu, Xiaoyue Wang, Eamonn Keogh, and Sang-Hee Lee. Aug-
menting the generalized Hough transform to enable the mining of
petroglyphs. In Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining, KDD ’09, page
1057–1066, New York, NY, USA, 2009. ACM.
[ZWKL10] Qiang Zhu, Xiaoyue Wang, Eamonn Keogh, and Sang-Hee Lee. An
efficient and effective similarity measure to enable data mining of
petroglyphs. Data Mining and Knowledge Discovery, 23(1):91–127,
September 2010.
[ZWKL11] Qiang Zhu, Xiaoyue Wang, Eamonn Keogh, and Sang-Hee Lee. An
efficient and effective similarity measure to enable data mining of pet-
roglyphs. Data Mining and Knowledge Discovery, 23(1):91–127, July
2011.
[ZWS15] Matthias Zeppelzauer, Ewald Wieser, and Markus Seidl. A novel
annotation tool for complex petroglyph shapes. In The Future of
Datasets in Vision Workshop (in conjunction with CVPR 2015),
Boston, MA, USA, 06 2015.
[ZZJS16] Matthias Zeppelzauer, Bartosz Zieliński, Mateusz Juda, and Markus
Seidl. Topological descriptors for 3d surface analysis. In Alexandra
Bac and Jean-Luc Mari, editors, Computational Topology in Image
Context: 6th International Workshop, CTIC 2016, Marseille, France,
June 15-17, 2016, Proceedings, pages 77–87. Springer International
Publishing, 2016.
183
REFERENCES
184
Appendices
185

3D dataset
In this Appendix, we present an overview of our 3D dataset. Table A.1 contains a
summary of all 3D scans. Table A.2 encloses basic measures of the scans. Figures A.1
to A.26 show orthophotos of all scans. The dataset has been acquired by the german
3D-surveying specialist ARCTRON1 for the 3D-Pitoti project.
1
http://www.arctron.de
187
A. 3D DATASET
Table A.1: Summary of the reconstructed surfaces in the dataset. For each surface
we provide an ID, description, its location (in terms of site, rock, and area) and the
reconstruction method.
ID Description Site Rock Area Reconstr.
Method
1 Rosa Camuna Foppe di Nadro 24 7 SLS
2 Warrior Foppe di Nadro 24 7 SLS
3 Deer Foppe di Nadro 24 9 SLS
4 Stele (part 1) Naquane Ossimo 8 11 SLS
5 Stele (part 2) Naquane Ossimo 8 11 SLS
6 Stele (part 3) Naquane Ossimo 8 11 SLS
7 Stele (part 4) Naquane Ossimo 8 11 SLS
8 Standing Rider (part 1) Naquane 50 14 SFM
9 Standing Rider (part 2) Naquane 50 14 SFM
10 Sun-Shape Superimposition Naquane 50 15 SFM
11 Warrior Scene (part 1-1) Seradina 12C 1 SLS
12 Warrior Scene (part 1-2) Seradina 12C 1 SLS
13 Warrior Scene (part 1-3) Seradina 12C 1 SLS
14 Warrior Scene (part 1-4) Seradina 12C 1 SLS
15 Warrior Scene (part 2-2) Seradina 12C 1 SLS
16 Warrior Scene (part 2-3) Seradina 12C 1 SLS
17 Warrior Scene (part 2-4) Seradina 12C 1 SLS
18 Warrior Scene (part 3-1) Seradina 12C 1 SLS
19 Warrior Scene (part 3-2) Seradina 12C 1 SLS
20 Warrior Scene (part 3-3) Seradina 12C 1 SLS
21 Hunter With Bow Seradina 12C 4 SLS
22 Hunter With Spear (part 1) Seradina 12C 5 SLS
23 Hunter With Spear (part 2) Seradina 12C 5 SLS
24 Hunting Scene (part 1) Seradina 12C 10 SLS
25 Hunting Scene (part 2) Seradina 12C 10 SLS
26 Hunting Scene (part 3) Seradina 12C 10 SLS
188
Table A.2: Overview of basic measures of the digitized surfaces: the covered area (in
pixels at 300dpi and in cm2), the number of 3D points in the point cloud, the percentage of
pecked regions, the number of disconnected pecked regions and the range of depth values.
ID Covered Area Num. Percentage Num. Depth Range
in px in cm2 3D Pts. Pecked Seg. in mm
1 5 143 296 368.69 3 264 005 14.61 48 2.89
2 15 638 394 1121.03 10 280 976 10.56 21 4.83
3 8 846 214 634.14 5 503 742 47.63 18 9.11
4 15 507 622 1 111.66 3 782 381 14.96 17 62.52
5 16 994 561 1 218.25 2 658 330 17.27 44 70.60
6 13 102 254 939.23 1 260 401 12.67 13 49.32
7 12 035 386 862.75 810 312 34.02 26 15.17
8 12 834 446 920.03 8 677 163 26.17 45 6.74
9 12 835 586 920.11 8 386 259 32.83 29 3.82
10 5 901 454 423.04 2 096 476 21.59 9 5.41
11 5 632 144 403.74 3 541 799 9.26 23 10.23
12 7 103 936 509.24 4 432 013 5.09 6 10.22
13 6 155 628 441.26 3 810 000 8.26 63 19.85
14 5 855 280 419.73 4 417 779 6.47 17 10.50
15 4 855 764 348.08 2 981 570 4.44 24 9.39
16 4 029 231 288.83 2 523 543 6.58 29 4.27
17 4 838 487 346.84 3 022 433 3.15 27 21.75
18 6 396 152 458.50 4 007 232 19.41 25 9.45
19 7 141 253 511.92 4 472 845 18.20 32 17.32
20 6 864 476 492.08 4 238 990 12.02 15 21.39
21 3 909 579 280.26 2 255 030 20.40 61 5.32
22 4 073 804 292.03 2 395 125 16.34 65 3.99
23 3 612 131 258.93 2 113 670 24.23 54 5.33
24 19 104 798 1 369.52 10 685 564 26.61 152 27.35
25 14 920 005 1 069.53 8 188 025 15.55 63 17.49
26 8 921 684 639.55 5 515 973 15.59 99 16.62
Overall 232 253 565 16 648.97 115 321 636 18.68 1 025 70.60
189
A. 3D DATASET
Figure A.1: Orthophoto of scan ID 1.
190
Figure A.2: Orthophoto of scan ID 2.
191
A. 3D DATASET
Figure A.3: Orthophoto of scan ID 3.
192
Figure A.4: Orthophoto of scan ID 4.
193
A. 3D DATASET
Figure A.5: Orthophoto of scan ID 5.
194
Figure A.6: Orthophoto of scan ID 6.
195
A. 3D DATASET
Figure A.7: Orthophoto of scan ID 7.
196
Figure A.8: Orthophoto of scan ID 8.
197
A. 3D DATASET
Figure A.9: Orthophoto of scan ID 9.
198
Figure A.10: Orthophoto of scan ID 10.
199
A. 3D DATASET
Figure A.11: Orthophoto of scan ID 11.
200
Figure A.12: Orthophoto of scan ID 12.
201
A. 3D DATASET
Figure A.13: Orthophoto of scan ID 13.
202
Figure A.14: Orthophoto of scan ID 14.
203
A. 3D DATASET
Figure A.15: Orthophoto of scan ID 15.
Figure A.16: Orthophoto of scan ID 16.
204
Figure A.17: Orthophoto of scan ID 17.
Figure A.18: Orthophoto of scan ID 18.
205
A. 3D DATASET
Figure A.19: Orthophoto of scan ID 19.
206
Figure A.20: Orthophoto of scan ID 20.
207
A. 3D DATASET
Figure A.21: Orthophoto of scan ID 21.
Figure A.22: Orthophoto of scan ID 22.
208
Figure A.23: Orthophoto of scan ID 23.
209
A. 3D DATASET
Figure A.24: Orthophoto of scan ID 24.
Figure A.25: Orthophoto of scan ID 25.
Figure A.26: Orthophoto of scan ID 26.
210
Peer-reviewed Publications
The following peer-reviewed publications contain results of the thesis:
Journal Articles
Markus Seidl, Ewald Wieser, and Craig Alexander. Automated classification of
petroglyphs. Digital Applications in Archaeology and Cultural Heritage, 2(2,3):196 –
212, 2015
Ewald Wieser, Markus Seidl, and Matthias Zeppelzauer. A study on skeletonization
of complex petroglyph shapes. Multimedia Tools and Applications, pages 1–19, 2016
Matthias Zeppelzauer, Georg Poier, Markus Seidl, Christian Reinbacher, Samuel
Schulter, Christian Breiteneder, and Horst Bischof. Interactive 3d segmentation of
rock-art by enhanced depth maps and gradient preserving regularization. ACM Journal
on Computing and Cultural Heritage, In Press
Conference Papers
Markus Seidl and Christian Breiteneder. Detection and classification of petroglyphs
in gigapixel images – preliminary results. VAST11: The 12th International Symposium
on Virtual Reality, Archaeology and Intelligent Cultural Heritage - Short Papers, pages
45–48, 2011
Markus Seidl and Christian Breiteneder. Automated petroglyph image segmentation
with interactive classifier fusion. In Proceedings of the Eighth Indian Conference on
Computer Vision, Graphics and Image Processing (ICVGIP ’12), 2012
Matthias Zeppelzauer and Markus Seidl. Efficient image-space extraction and rep-
resentation of 3d surface topography. In IEEE International Conference on Image
Processing (ICIP), pages 2845–2849, Sept 2015
Matthias Zeppelzauer, Georg Poier, Markus Seidl, Christian Reinbacher, Christian
Breiteneder, Horst Bischof, and Samuel Schulter. Interactive segmentation of rock-art
in high-resolution 3d reconstructions. In 2015 Digital Heritage, volume 2, pages 37–44,
Sept 2015 - Best Paper Award
211
B. PEER-REVIEWED PUBLICATIONS
Workshop Papers
Markus Seidl, Ewald Wieser, Matthias Zeppelzauer, Axel Pinz, and Christian Breit-
eneder. Graph-Based Shape Similarity of Petroglyphs. In Lourdes Agapito, Michael M.
Bronstein, and Carsten Rother, editors, Computer Vision - ECCV 2014 Workshops, vol-
ume 8925 of Lecture Notes in Computer Science, pages 133–148. Springer International
Publishing, 2015
Matthias Zeppelzauer, Bartosz Zieliński, Mateusz Juda, and Markus Seidl. Topolog-
ical descriptors for 3d surface analysis. In Alexandra Bac and Jean-Luc Mari, editors,
Computational Topology in Image Context: 6th International Workshop, CTIC 2016,
Marseille, France, June 15-17, 2016, Proceedings, pages 77–87. Springer International
Publishing, 2016
Poster
Matthias Zeppelzauer, Ewald Wieser, and Markus Seidl. A novel annotation tool
for complex petroglyph shapes. In The Future of Datasets in Vision Workshop (in
conjunction with CVPR 2015), Boston, MA, USA, 06 2015
212