Graph-Based Checkerboard
Indexing and its Applications in
Camera Calibration and
Multi-View Processing
DIPLOMARBEIT
zur Erlangung des akademischen Grades
Diplom-Ingenieurin
im Rahmen des Studiums
Visual Computing
eingereicht von
Huber Hanna, BSc
Matrikelnummer 00925230
an der Fakultät für Informatik
der Technischen Universität Wien
Betreuung: O.Univ.Prof. Dr. Walter G. Kropatsch
Mitwirkung: Dipl. Ing. Ines Janusch, BSc
Wien, 27. Februar 2018
Huber Hanna Walter G. Kropatsch
Technische Universität Wien
A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at
Die approbierte Originalversion dieser Diplom-/ 
Masterarbeit ist in der Hauptbibliothek der Tech-
nischen Universität Wien aufgestellt und zugänglich. 
 
http://www.ub.tuwien.ac.at 
 
 
 
 
The approved original version of this diploma or 
master thesis is available at the main library of the 
Vienna University of Technology. 
 
http://www.ub.tuwien.ac.at/eng 
 

Graph-Based Checkerboard
Indexing and its Applications in
Camera Calibration and
Multi-View Processing
DIPLOMA THESIS
submitted in partial fulfillment of the requirements for the degree of
Diplom-Ingenieurin
in
Visual Computing
by
Huber Hanna, BSc
Registration Number 00925230
to the Faculty of Informatics
at the TU Wien
Advisor: O.Univ.Prof. Dr. Walter G. Kropatsch
Assistance: Dipl. Ing. Ines Janusch, BSc
Vienna, 27th February, 2018
Huber Hanna Walter G. Kropatsch
Technische Universität Wien
A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at

Erklärung zur Verfassung der
Arbeit
Huber Hanna, BSc
hhuber@prip.tuwien.ac.at
Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen-
deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der
Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder
dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter
Angabe der Quelle als Entlehnung kenntlich gemacht habe.
Wien, 27. Februar 2018
Huber Hanna
v

Danksagung
Vielen Dank an Indiecam1 für die Bereitstellung einer indiePOV HDSDI Kamera und
für die Finanzierung des Technologie-Transfers (ID: 1360780) gemeinsam mit der Wirt-
schaftsagentur Wien2.
1www.indiecam.com
2www.wirtschaftsagenturwien.at
vii

Acknowledgements
Many thanks to Indiecam3 for providing an indiePOV HDSDI camera and for funding
the technology transfer (ID: 1360780) together with the Vienna Business Agency4.
3www.indiecam.com
4www.viennabusinessagency.at
ix

Kurzfassung
In dieser Arbeit wird eine neue Methode zur Indizierung eines Schachbrettmusters prä-
sentiert. Es wird ein graphenbasierter Ansatz gewählt, um die Topologie des Musters zu
erfassen. Die Regionen und Kreuzungspunkte werden mittels Bildsegmentierung identifi-
ziert und durch einen Graphen repräsentiert. Dabei wird kein explizites mathematisches
Modell für die im Bild vorhandene Verzerrung angenommen. Ein speziell entworfenes
Kalibrierungsmuster ermöglicht die Identifizierung eines eindeutigen Referenzpunktes
im Schachbrettmuster. Dadurch können die Kreuzungspunkte als Gitterpunkte eines
Zielkooridnatensystems interpretiert werden. Mögliche Anwedungen im Bereich der Ka-
merkalibrierung und Mehrkamera-Verfahren werden diskutiert.
xi

Abstract
In this thesis, a new method for checkerboard indexing is presented. A graph-based
approach is used to capture the topology of the checkerboard pattern. The regions and
crosspoints of the pattern are identified using image segmentation and represented by a
graph. At this point, no explicit mathematical model is assumed which describes the
distortion in the checkerboard image. A special calibration pattern design allows the
identification of a unique reference point in the pattern. This way, the crosspoints can be
interpreted as the grid points of a target coordinate system. Possible applications in the
fields of camera calibration and multi-view processing are discussed.
xiii

Contents
Kurzfassung xi
Abstract xiii
Contents xv
1 Introduction 1
2 Related Work 3
2.1 Checkerboard Indexing and Camera Calibration . . . . . . . . . . . . . 3
2.2 Multi-View Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Graph-Based Representations . . . . . . . . . . . . . . . . . . . . . . . 4
3 Checkerboards as CS Representations 7
3.1 Coordinate Transformation Problem . . . . . . . . . . . . . . . . . . . 7
3.2 Graph-Based Representation . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Calibration Pattern Design . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Crosspoint Detection 15
4.1 LBP Saddles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Circular Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Corner Likelihood Map and Gradient Statistics . . . . . . . . . . . . . 18
5 Checkerboard Segmentation 21
5.1 SCIS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Segmentation based on Gradient Magnitude . . . . . . . . . . . . . . . 23
5.3 Connected Component Labelling of Binary Images . . . . . . . . . . . 24
5.4 Refined Segmentation based on Intensity Values . . . . . . . . . . . . . 25
6 Graph-Based Checkerboard Indexing Algorithm 27
6.1 Crosspoint Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Pattern Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Distorted Checkerboard Segmentation . . . . . . . . . . . . . . . . . . 30
6.4 Crosspoint Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 33
xv
6.5 Origin and Orientation Detection . . . . . . . . . . . . . . . . . . . . . 35
6.6 Orientation Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.7 Target Coordinates Assignment . . . . . . . . . . . . . . . . . . . . . . 37
7 Applications of Graph-Based Checkerboard Indexing 39
7.1 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Distortion and Undistortion . . . . . . . . . . . . . . . . . . . . . . . . 40
7.3 Multi-View Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8 Evaluation 43
8.1 Metrics and Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.3 Applications and Future Work . . . . . . . . . . . . . . . . . . . . . . 59
9 Conclusion 61
List of Figures 63
List of Tables 67
List of Algorithms 69
Bibliography 71
CHAPTER 1
Introduction
Checkerboard patterns play an important part in various camera calibration techniques.
Their images reveal the properties of the respective cameras as regular structures in the
original pattern are transformed by the camera lenses. This way, the projection from
(regular) world to (distorted) image space can be measured [Sze10]. In this thesis, a
new graph-based checkerboard indexing method is proposed. It captures the structure
of a (possibly distorted) checkerboard pattern by identifying its crosspoints and their
topology. The suggested approach has several advantages over existing indexing methods
which are useful in various application contexts including camera calibration as well as
multi-view processing. The aim of this work is to analyse the proposed method and its
applicability in these contexts.
Common calibration techniques assume a specific projection model which describes
the distortion in the checkerboard pattern. During the calibration procedure, they
perform parameter fitting for this model [Sze10]. This way, a technique is limited to par-
ticular types of distortion and fails if the actual distortion deviates from the mathematical
model. This is particularly relevant in case of fisheye lenses: They cause severe distortions
which can only be approximated [KB06]. This work suggests an alternative approach to
overcome these limitations: Instead of fitting detected structures to an a priori defined
model, the distortion of the pattern is captured implicitly. The checkerboard pattern
is representing a target coordinate system (CS) with axes defined by the pattern lines
and grid points defined by the crosspoints between the patches. Each crosspoint is given
a target coordinate. Two consecutive image segmentation steps and a special pattern
design including unique reference structures are used to identify the corresponding target
coordinate for each crosspoint. This way, a low-resolution mapping from image to target
space is created. As it is defined pointwise for each crosspoint-pair, it is able to capture
global as well as non-global transformations.
1
1. Introduction
Apart from distortions caused by the camera, multi-view applications such as image
stitching also need to consider image transformations between different views. In order to
align images accordingly, corresponding points in the respective images have to be found.
Common alignment methods perform a pairwise comparison between the individual
images for this purpose. To create image mosaics, the individual images also have to
be mapped to the final mosaic space which involves further transformation such as
image warping. For multi-view setups with a fixed relative position between the different
views, efficiency could be increased by directly mapping the individual images to the
mosaic in parallel [BHKB16]. This motivates the use of a calibration procedure during
which the mapping from the individual image space to the common mosaic space can
be determined for each camera independently. After calibration, this mapping can be
applied to new images taken by the cameras. In addition to our graph-based indexing
method, we want to present a checkerboard pattern design that contains unique reference
structures so that it can serve as a global target CS. In this context, we want to provide
a low-resolution pointwise mapping with our indexing algorihtm that can be used to
calibrate such multi-view setups.
Following a basic concept [BHKB16] that already addressed these motivations of a
model-free graph-based calibration method for multi-view applications, the contributions
of this thesis are:
• The design of a calibration pattern: It contains unique reference structures so
it can be used as a target CS in which origin and orientation can be detected
automatically.
• A new graph-based checkerboard indexing method: It uses image segmentation
to identify the patches and crosspoints in the checkerboard. Their topology is
captured without assuming a mathematical model. For each crosspoint, a mapping
from image to target space is calculated.
• An evaluation of the proposed method with focus on its application for camera
calibration and multi-view processing.
This thesis is structured as follows: In Chapter 2, related work in research areas that are
tackled in this thesis is discussed. Next, we describe the coordinate transformations we
want to address and the usage of checkerboards as CS representations in Chapter 3. In
Chapters 4 and 5, we discuss fundamental work that can be used to solve the problems of
crosspoint detection and checkerboard segmentation, respectively. Then, our checkerboard
indexing algorithm is described in detail in Chapter 6. In Chapter 7, we discuss how
it could be useful for camera calibration and multi-view applications. An analysis of
the method including a quantitative evaluation of the results is presented in Chapter 8.
Finally, we conclude this thesis in Chapter 9.
2
CHAPTER 2
Related Work
In this section, previous work in research areas related to this thesis is discussed. It
includes different approaches in the fields of camera calibration, multi-view problems and
graph-based representations.
2.1 Checkerboard Indexing and Camera Calibration
Common camera calibration methods use a parametric model which describes the map-
ping from world to image space. In a next step, suitable parameter values are found
which define the actual mapping. For fisheye lenses, however, the rules of perspective
projection do not apply due to their large field of view (FOV). Hence, mathematical
models only predict the behavior of ideal lenses and the true mapping can merely be
approximated [KB06]. Different models applicable for fisheye lenses are provided by
Schwalbe [Sch05] and Kannala and Brand [KB06] who include a distortion term based on
polynomial functions. Alternatively, Aleman-Flores et al. [FLDC14] use division models
to describe severe distortion, which reduces the number of necessary parameters [Fit01].
Various methods for the determination of the parameter values exist [YHZ06, KF08, SS14,
WLaZ+15, ULH15]. They can be divided into techniques which require prior knowledge
about the position of feature points in world space [HGJD08] and self-calibrating meth-
ods [BBV01, FLDC14].
Abraham and Förstner [AF05] perform calibration and epipolar rectification for stereo
processing using fisheye lenses.
Geiger et al. [GMCS12] propose a calibration method for automatic camera and range
sensor calibration. They localize checkerboard crosspoints using a corner likelihood map.
Outliers are omitted and the checkerboard structure is recovered based on an energy
function. It includes a local linearity requirement which also allows distorted patterns,
but not with arbitrary distortion. They assume a rectangular pattern and determine its
orientation by distinguishing between the short and long side.
3
2. Related Work
Alternatively, Bok et al. [BHK16] perform checkerboard indexing using circular bound-
aries. They investigate the vector of image values sorted clockwise at given radii around
candidate pixels. Based on their differences to the mean of the largest and smallest value,
crosspoint candidates are established. The checkerboard indexing is performed with a
technique similar to the RANSAC algorithm [FB81], assuming similar displacements
between crosspoints.
Banaeyan et al. [BHKB16] present a concept for segmentation-based checkerboard identi-
fication based on the Structurally Correct Image Segmentation (SCIS) algorithm [Cer15].
Mucciolo et al. [MKS+17] use color-coded checkerboard patterns for calibration in a
multi-view video setup.
2.2 Multi-View Problems
Images taken from different views and/or cameras are commonly aligned using feature
points. SIFT features [BL03, BL07] and lower dimensional FAST features [KS04] capture
gradient changes and their orientation at different scales. Li et al. [LQS+15] increase the
amount of matched point pairs for multispectral images with high intensity differences
by introducing a non-linear scale space and an adapted feature matrix. SURF features
presented by Bay et al. [BTVG06] are based on Haar wavelets. All these features can
then be used to find corresponding points in images from different views.
A homography transformation which is calculated from matching point pairs can be used
to align images automatically [BL07]. However, global homographies do not consider
parallax errors which occur when images are taken from different viewpoints. They can
be accounted for by including local warping [ZL14, PSHZ+15].
For stereo reconstruction, images of a scene from multiple views are used to recon-
struct 3D information. With camera position and parameters available, the images can
be rectified so that the correspondance problem is reduced to a 1D search along epipolar
lines. [Sze10]
All these approaches for solving the correspondance problem require image content
that provides image points that can be well-distiguished based on the respective discrimi-
native feature.
2.3 Graph-Based Representations
Graph-based representations are desirable when topological information is important.
Applications include plant phenotyping [JK16], fruit quality control [LGAGVGAV13]
or image segmentation. For the latter, pixels are considered as separate regions in the
beginning and subsequently merged together. For this purpose, different criteria can be
applied. Felsenszwalb and Huttenlocher [FH04] consider the similarity within and between
regions. Their method is further improved by Magnusson and Olsson [MO14] using
4
2.3. Graph-Based Representations
automatic programming based on evolutionary principles. Alternatively, Cerman [Cer15]
preserves the image structure by performing image segmentation based on Local Binary
Patterns. Topologically accurate segmentation methods are particularly important in
medical applications [OS14].
5

CHAPTER 3
Checkerboards as CS
Representations
In this chapter, we formulate the problem we want to address in this thesis from a formal
point of view and discuss how it can be solved by representing the involved CSs by
checkerboard patterns.
3.1 Coordinate Transformation Problem
In this section, a formal problem formulation is presented. Subsequently, the following
terms and definitions will be used:
• Coordinate spaces are denoted by X with a suitable subscript. Important spaces
include:
– Image space XI ⊆ R2
– World space XW ⊆ R3
• Subsets of these spaces are denoted by Y with suitable subscripts.
• Coordinate mappings between these spaces and subsets are denoted by f with
suitable subscripts.
• Images are functions which are defined on these subsets. We differentiate between
– RGB images IRGB : Y ⊂ XI → [0, 1]3 ⊂ R3 and
– Grayscale images IGray : Y ⊂ XI → [0, 1] ⊂ R
7
3. Checkerboards as CS Representations
• In particular, we define a checkerboard image CI as
CI : YI ⊂ XI → {0, 1}
(x, y) 7→ i+ j mod 2 (3.1)
ai ≤ x < ai+1
cj ≤ y < cj+1
Y = (aI , bI)× (cI , dI), aI , bI , cI , dI ∈ R
ai < ai+1
cj < cj+1
ai ∈ (aI , bI) ⊂ R, i ∈ {1...m}
cj ∈ (cI , dI) ⊂ R, i ∈ {1...n},
where ai and cj are defined by the sizes of the patches in the checkerboard.
An overview of the subsequently described coordinate spaces and mappings is illustrated
in Figure 3.1.
Our indexing method is motivated by different image processing applications which
have in common that a coordinate transformation has to be determined. In general, this
problem can be formulated as follows (for application-specific details see Chapter 7): For
a given input image I defined on YI ⊂ XI , there exists a transformed target image I ′
with coordinates in YI′ ⊂ XI . Then, our aim is to find a mapping f : YI → YI′ such that
I(x, y) = I ′(f(x, y)) (3.2)
for (x, y) ∈ YI .
The target image IT is defined by its geometric characteristics (e.g. straight lines
in the target image where there are curves in the input image). Based on these character-
istics, a target CS XT ⊆ R2 with the desired geometry can be defined. To represent XT
in image space XI , a checkerboard pattern CT can be used. Its structure is well-suited to
represent the axes of a CS. We already defined a checkerboard image in image space in
Equation 3.1. Equivalently, we can define the same checkerboard in target space where
each crosspoint in the pattern corresponds to a grid point in XT :
CT : YT ⊂ XT → {0, 1}
(x, y) 7→ bxc+ byc mod 2 (3.3)
YT = (a, b)× (b, c), a, b, c, d ∈ R
Depending on the characteristics of XT , CT may have a regular or irregular structure
and the horizontal and vertical lines defining the patch edges may not be straight. The
crosspoints between the patches represent the grid points of XT in XI . Formally, this
8
3.1. Coordinate Transformation Problem
Figure 3.1: An example checkerboard pattern in target, image and world space: The red
dot marks the origin of the pattern. The green solid arrows represent the coordinate axes
in each space. The mappings between these spaces are depicted by dotted blue arrows.
representation can be described as a coordinate mapping fI
fI : YT → XI
(xT , yT ) 7→ (xI , yI) (3.4)
fI(YT ) = YI
It assigns each grid point with coordinates in XT its corresponding image coordinates
in XI . Intuitively, this mapping is defined by the sizes and shapes of the checkerboard
patches which, in turn, are defined by the target geometry. Formally, the checkerboard
pattern CT can now be considered as a checkerboard image CI defined in XI . This way,
we have constructed a target space with a desired geometry which can be mapped to image
space and represented by a checkerboard image. Figure 3.1 shows this representation for
a simple example checkerboard.
During a calibration procedure, the checkerboard pattern is used as follows:
At first, the original 2D image CT is printed and placed into 3D world space XW by a
9
3. Checkerboards as CS Representations
mapping fW
fW : YI → XW
(xI , yI) 7→ (xW , yW ) (3.5)
fW (YI) = YW
Once transformed into XW , deformations are possibly applied to the pattern, in particular
if the checkerboard is no longer planar in XW :
fDW : YW → XW
(xW , yW ) 7→ (xDW , yDW ) (3.6)
fDW (YW ) = YDW
Now, the camera takes an image of the pattern. Distortions are applied to the structure
by the camera lens as the checkerboard points are projected from 3D XW back to 2D XI
by a projection mapping fP
fP : YDW → XI
(xDW , yDW ) 7→ (xP , yP ) (3.7)
fP (YDW ) = YP
(3.8)
The result is a distorted checkerboard image CD defined on YP . If no deformations
are applied, i.e. fDW is the identity function, the original pattern structure is only
distorted by the camera lens, and camera properties can be determined by analyzing
the distortions in CD. In a checkerboard pattern, the applied distortions are clearly visi-
ble [BHKB16]. Otherwise, CD will show the combined distortions caused by fDW and fP .
CD is the input of our indexing algorithm. On the one hand, we can describe the
mapping f we want to find in Equation 3.2 to transform the input image I into our
target image I ′ as the inverse of the mapping that transformed CT into CD:
CD(f−1(x, y)) = CI(x, y) (3.9)
f = (fP ◦ fDW ◦ fW )−1 , (3.10)
where ◦ is the function composition [Kal09]. On the other hand, the locations of the
crosspoints in CD can be mapped back to their corresponding grid points in target space
by a mapping fT
fT : YP → XT
(xP , yP ) 7→ (xT , yT ) (3.11)
fT (YP ) = Y ′T ⊆ YT
10
3.2. Graph-Based Representation
Depending on the FOV of the camera and possible occlusions, Y ′T can be a real subset of
YT .
Together with the coordinate transformation fI defined in Equation 3.4, we can consider
an alternative description of f given by
f = fI ◦ fT (3.12)
As fI is already given by the checkerboard, it is sufficient to determine fT in order to
find the mapping f . This is the aim of our indexing method.
3.2 Graph-Based Representation
A graph-based representation of the checkerboard image was proposed by Banaeyan et
al. [BHKB16]: Graphs are well-suited to capture the pattern’s topology. In the original
pattern image, each pixel corresponds to a node in the graph. The nodes of neighboring
pixels are linked with edges. Using image segmentation, the pixels corresponding to
a particular patch are merged into a single region. In the graph representation, this
corresponds to the contraction of the edges between these nodes. This way, a region
adjacency graph (RAG) is created where each patch is represented by a single node.
Again, the nodes of neighboring patches are linked with edges. In a checkerboard pattern,
the RAG is a 4-connected graph.
Alternatively, the pattern structure can be represented by its crosspoints. This way,
the graph representation contains a node for each crosspoint and an edge for each edge
between two crosspoints in the pattern. The resulting graph corresponds to the dual
graph of the (primal) RAG [Cer15].
Both representations are well-suited to encode the topology of the pattern. They
are illustrated in Figure 3.2. While the RAG representation is favourable for segmenta-
tion, the dual graph representation is chosen for the final indexing where we want to focus
on the crosspoints. The RAG can be derived easily from the dual graph if necessary.
3.3 Calibration Pattern Design
In this section, we present the calibration pattern design that is used with our graph-based
checkerboard indexing method.
Our aim is to create a pattern with the following properties:
i) It is able to represent the target CS and its axes.
ii) The grid points of the CS can be detected in an image of the pattern with high
accuracy.
11
3. Checkerboards as CS Representations
Figure 3.2: Graph-based representations of a checkerboard: Primal RAG (red dot nodes
and dashed edges) and dual graph (green cross nodes and solid edges)
iii) It contains unique structures that serve as reference points. They make it possible
to automatically identify the origin and orientation of the CS represented by the
pattern. Subsequently, individual grid points can be identified based on their
relative position to the origin.
iv) The properties i)-iii) are still fulfilled if a limited distortion is applied to the pattern.
v) The characteristics of the distortion are visible in the projected pattern.
Tradititional checkerboards consist of alternating black and white square patches. Fig-
ure 3.3 (a) shows an example image of such a checkerboard pattern used by Geiger
et al. [GMCS12]. It shows a visible distortion. However, alternative patterns can also
be used to estimate distortion properties. A color-coded example used by Aleman at
al. [FLDC14] is given in Figure 3.3 (b).
Traditional checkerboard patterns already fulfill properties i), ii), iv) and v) [GMCS12].
Without property iii), the individual crosspoints between the patches cannot be dis-
tiguished because of their similarity. In order to solve our mapping problem (3.2),
corresponding points have to be identified in the respective checkerboard images. How-
ever, common approaches for solving the correspondance problem, such as discriminative
features or searching along epipolar lines, require distiguishable image points. Hence,
they cannot be applied in this case. Instead, property iii) allows the distinction of the
individual crosspoints based on their position in the target CS.
A possible way to add unique structures to the pattern would be to modify the regular
square geometry in parts of the pattern. An example is given in Figure 3.4. Some
square patches have been replaced with triangles, yielding a diamond-shaped struc-
ture. This modification is motivated by the use of a RAG representation of the pattern
structure. Whereas the regular square structure yields a regular 4-connected RAG, a
12
3.3. Calibration Pattern Design
Figure 3.3: Examples of calibration patterns: (a) Distorted traditional checkerboard
pattern. Image taken from Geiger et al. [GMCS12]. (b) Alternative color-coded calibration
pattern. Image taken from Aleman at al. [FLDC14].
Figure 3.4: Examples of geometrical reference structures captured by the RAG
different connectivity can be detected at the nodes representing the triangles. Addition-
ally, several reference points can be included by varying the size of the diamond shape
created by the triangulated structures in the pattern. This can be useful to encode infor-
mation about the orientation of the pattern or for multi-view applications (see Chapter 7).
Alternatively, we have seen that for RGB input images, the use of color is a suit-
able choice to add spatial information to the pattern [MKS+17, FLDC14]. Specific
parts of the pattern can be emphasized by color-coding selected patches. At this point,
we distinguish between information about the origin and about the orientation of the
pattern. The origin can be identified by highlighting two diagonally adjacent patches.
The crosspoint between them defines a unique origin. Additionally, information about the
orientation can be added by distinguishing between the two color-coded patches. As one
of them is defined as the upper left patch relative to the origin crosspoint, the orientation
13
3. Checkerboards as CS Representations
Figure 3.5: Target CS represented by a checkerboard pattern with a color-coded origin
between the red (upper left) and green (lower right) patch.
of the pattern can be determined. At this point, we choose red and green as colors for the
upper left and lower right patch, respectively. A similar combination is used by Mucci-
olo at al. [MKS+17]. It considers the distances of the involves colors on the unit RGB cube.
Figure 3.5 shows an example pattern and the corresponding target CS. The crosspoints
and horizontal and vertical lines in the pattern define the grid points and coordinate
axes in the CS, respectively. This example pattern represents a regular grid CS. However,
other target CS with two coordiante axes such as cylindrical or spherical CS can be used
instead depending on the application (more in Chapter 7).
Compared to the geometrical approach, the color-coded reference structures maintain the
regular square structure of the pattern. The simple 4-connectivity is a favourable prop-
erty which we can profit from when we calculate the checkerboard indexing (Chapter 6).
Above that, with simple square patches, distortions become more distinct compared
to the original pattern. At the same time, the finer details of the geometric approach
require a higher image resolution to be captured properly. They also may be more prone
to errors due to degenerate triangles caused by strong distortions.
With the color-coded patches visible in the pattern image, property iii) is also ful-
filled and we have created a calibration pattern with which we can perform automatic
checkerboard indexing. The automatic detection of origin and orientation are part of the
indexing algorithm and will be described in Chapter 6.
14
CHAPTER 4
Crosspoint Detection
A core part of checkerboard indexing is the detection of the crosspoints in the image as
they mark the grid points of the represented CS. In real images of checkerboards, blur
and distortion can cause indistinct transitions between neighboring patches and thus, also
indistinct crosspoint locations. At this point, by considering coordinates with subpixel
precision, the exact crosspoints can be located more accurately. In this chapter, different
approaches for crosspoint detection are compared. In particular, their suitability for our
model-free indexing algorithm is discussed.
4.1 LBP Saddles
In a checkerboard, the crosspoints are surrounded by four alternating black and white
patches. This structural property can be used to define a suitable detection method.
Local Binary Patterns (LBP) [OPH96] describe an image point by the structure of
its local neighborhood: Its pixel value is compared to the neighboring values at a given
radius. Let p be such a pixel in an image I and p′ a particular neighbor pixel. Then, the
relation between p and p′ is described by a 1-bit code. Its value can be determined by a
sign function s as
s(p, p′) =
{
1, if I(p′) ≥ I(p).
0, otherwise.
(4.1)
The neighbor pixels are sorted in clockwise or counter-clockwise order and their values of
s are then combined to define an N-bit LBP code
LBP (p) =
N∑
i=0
s(p, p′i)2i, (4.2)
with N denoting the number of neighbors p′i. This code describes the local structure at p
in the image I.
15
4. Crosspoint Detection
Figure 4.1: Saddle point and face in primal and dual graph. The arrows represent pixel
relations and are color coded according to their sstrict values (black for 0, green for 1).
Image taken from Cerman [Cer15].
Cerman [Cer15] uses LBPs to perform image segmentation that preserves structural
correctness in an image (see Chapter 5). For this purpose, he calculates the LBP code of
each image point by considering the four adjacent pixels in a 4-connected neighborhood.
This 4-bit code can be used to identify image points belonging to particular structural
classes. In particular, pixels with an LBP code of LBP = 1010 or LBP = 0101 are
classified as saddle points. At this point, Cerman encodes the relation between two pixels
using a strict inequality
sstrict(p, p′) =
{
1, if I(p′) > I(p).
0, otherwise.
(4.3)
In a graph-based representation (see Chapter 3), saddles can be identified both in the
primal as well as in the dual graph (see Figure 4.1). In the primal graph, each pixel
represents a region. In a 4-connected graph, a face is formed by 2-by-2 pixels. Faces are
the elements of the dual graph. To determine the LBP code of a face, sstrict is evaluated
for two adjacent pixels, respectively. A saddle face is created by alternating sstrict values
between these regions.
It can be observed that crosspoints in a checkerboard image are often located at such
saddle faces. This motivates the idea of using the LBP class of a face as an indicator for
our crosspoint detection task. This could be achieved with the following approach: In
the dual graph representation of the input image, we determine the LBP class of each
face. If a face is classified as saddle, we assume that it represents a crosspoint. This
approach is particularly favourable if the checkerboard indexing is performed using the
SCIS algorithm as proposed by Banaeyan et al. [BHKB16]. In this case, the saddle faces
are already identified in a preprocessing step preparing image segmentation. Thus, no
additional calculations would be required.
16
4.2. Circular Boundaries
However, the LBP class of a face in a 4-connected graph is not a reliable feature
for crosspoint detection in real images. On the one hand, saddle faces also occur within
the homogeneous parts of a checkerboard patch due to noise. If the checkerboard pattern
only covers a part of the image, LBP saddles can also occur at the border of the pattern
and in the environment. Thus, further processing is needed to distinguish crosspoint
saddles from noise and environment saddles.
On the other hand, depending on the amount of blur and distortion in the input image,
a crosspoint does not always form a 2-by-2 saddle face. In these cases, further processing
is needed to detect these crosspoints.
4.2 Circular Boundaries
Along with their indexing technique (see Chapter 2), Bok et al. [BHK16] propose an
alternative feature for the detection of crosspoints in the image of a possibly distorted
checkerboard: They use a vector representing the circular boundary (CB) of an image
point to describe its local structure. Similar to the LBP approach, they consider the
pixels in a circular neighborhood at a given radius. However, they do not compare their
image values to the pixel in the center, but subtract the mean value of the largest and
smallest value in the neighborhood. This way, a CB vector is created where the sign of
an entry indicates its relation to the mean value. Based on this sign, Bok et al. deduct
criteria to identify crosspoint candidates which are depicted in Figure 4.2:
i) In the CB vector, the sign changes exactly four times.
ii) The difference between the indices corresponding to the opposite sign changes is
approximately equal to half the length of the vector.
iii) The indices corresponding to the opposite sign changes are approximately equal to
those in a CB vector with a smaller radius, but the same center.
Among uniformly distributed seed points, initial crosspoint candidates are selected based
on characteristic i). After a window-based iterative refinement step using image gradients
in x and y direction, more outliers and duplicates are discarded using ii). Outliers at
checkerboard edges are specifically discarded by comparing the slopes of the surrounding
edges. Finally, remaining outliers are eliminated during the indexing step (see Chapter 2).
The CB approach improves the crosspoint detection compared to the LBP saddle ap-
proach as it is able to capture the alternating neighborhood structure at a larger scale
and is not limited to 2-by-2 saddle points. To avoid the displacement assumptions made
during the indexing step, the steps before can be performed for crosspoint detection only,
which is the aim of this chapter.
However, the discarding criteria are not well-suited to allow arbitrary distortions. While
17
4. Crosspoint Detection
Figure 4.2: CB vector (outer green cirle) with characteristics at a crosspoint (red
point): four sign-changing indices (points with red circles), differences between opposite
sign-changing indices corresponding to vector half-length (black arrows) and similar
sign-changing indices to smaller CB vector (blue inner circle). Image taken from Bok et
al. [BHK16].
outliers still remain, the slope criterion which aims at eliminating outliers close to edges,
also discards crosspoints in strongly tilted checkerboards.
4.3 Corner Likelihood Map and Gradient Statistics
Geiger et al. [GMCS12] also distinguish between crosspoint detection and checkerboard
recovery. They propose an alternative corner detector which combines gradient statistics
with a corner likelyhood map.
For two corner prototypes (ideal and rotated by 45 degrees, see Figure 4.3) the responses
of four filter kernels as well as the mean filter response are considered to calculate the
corner likelyhood map C. Corner candidates are determined by subsequent non-maxima
suppression on C.
Additionally, gradient statistics are calculated as depicted in Figure 4.3: An orientation
histogram with 32 bins is created from the grayscale input image I. The two dominant
modes are used to create an expected gradient magnitute template T . The final corner
score is calculated as
R = (T ? ||∇I||2) · C, (4.4)
where ? denotes the normalized cross-correlation (NCC). Thresholding yiels the final
corner candidates.
For corner candidates with a score above a threshold τ , subpixel locations c are determined
by expecting gradients gp at points p in a local neighborhood N to be orthogonal to p− c.
The solution of this optimization problem can be calculated in closed form, providing the
18
4.3. Corner Likelihood Map and Gradient Statistics
Figure 4.3: Corner score based on (a) ideal and (b) rotated corner prototypes and (c)
gradient statistics. Image taken from Geiger et al. [GMCS12].
final corner location c:
c =
∑
p∈N
gpg
T
p
−1 ∑
p∈N
(
gpg
T
p
)
p (4.5)
Again, the subsequent recovery step involving a local linearity condition is able to
eliminated last remaining outliers.
To be more flexible with regard to the geometry of the checkerboard we do not consider
this recovery step at this point. Still, combining a likelihood map with gradient statistics
already yields a robust detector compared to the previous methods using LBP saddles
and CBs. Above that, the localization of subpixel coordinates is very accurate, which is
favourable for our task.
19

CHAPTER 5
Checkerboard Segmentation
We choose a segmentation-based approach for our indexing method, in order to capture
the structural information of the checkerboard pattern at region-level. In this chapter,
fundamental work in the field of image segmentation is discussed. In particular, the
suitability for checkerboard segmentation is considered for each method.
5.1 SCIS Algorithm
The SCIS Algorithm proposed by Cerman [Cer15] uses a graph-based segmentation
approach.
It creates a graph pyramid providing the image at different segmentation stages along
with the segmentation history. At the bottom level of the pyramid, the nodes correspond
to pixels and their values to the respective RGB or gray values. At higher levels, the
nodes correspond to regions and their values to the mean of the merged pixel values.
The SCIS algorithm focuses on the structural correctness of the segmented image. Struc-
tural relations in the image are encoded using LBPs (see Chapter 4) and combinatorial
maps. Each edge e between two nodes ni, nj in the graph consists of two half-edges
called darts di and dj which are incident to the respective nodes.
A combinatorial map C is defined as the triplet constituted by the set of darts D,
an involution α and a permutation σ:
C = (D, α, σ) (5.1)
. α is a mapping on D which assigns each dart di its counterpart dj for edges e = (di, dj).
σ is also a mapping on D which assigns each dart d′i the subsequent dart d′i (on clockwise
or counterclowise order) incident to the same node ni.The sequences created by these
mappings are called orbits. A combinatorial map encodes both the primal as well as the
21
5. Checkerboard Segmentation
dual graph. For the dual graph, a respective orbit is created by the composition of the
mappings in the primal graph:
φ = α ◦ σ ( or σ ◦ α) (5.2)
These orbits are useful to encode the orientation of nodes around a vertex or within a face.
Let vi and vj be the values of ni, nj . Then, each dart is assigned a weight w according
to the relation between vi and vj :
w(di) =
{
1, if vi ≥ vj .
0, otherwise.
(5.3)
The weights of all darts incident to a node constitute its LBP code. It can be used to
categorize nodes by dividing them into LBP classes such as maxima, minima, slopes or
saddles. A structurally correct image representation is guaranteed by specific conditions:
• To create an image graph in which each face can be classified as a slope, addtional
nodes are inserted between nodes that form a saddle.
• The definition of the weight function w given in Equation 5.3 yields ambiguous
information due to the ≥ relation. To achieve a strict inequality, neighboring nodes
with equal values, called plateaus, are merged.
• When the pyramid is created, two nodes can only be merged if the LBP code of
their neighbors remains the same after the merging operation.
• Also, two nodes are only merged if the value of the resulting node is sufficiently
different from its neighbor nodes.
This LBP approach has different advantages. It is able to preserve the texture in an image
during segmentation. Above that, LBP encoding is well-suited to handle illumination
changes [HRM06]. The construction of a graph pyramid has the advantage that the
resolution of the graph representation can be adjusted continuously.
Banaeyan et al. [BHKB16] propose a checkerboard indexing concept which uses the
SCIS algorithm for checkerboard segmentation. However, while structurally correct
segmentation works well with many input images, problems arise with the specific task
of checkerboard segmentation:
On the one hand, saddle faces are often present in checkerboard images close to crosspoint
locations. Thus, an additional node is added to turn it into a slope face (see Figure 5.1).
This way, merging of patches across a crosspoint is encouraged even for ideal checkerboard
images (with perfect square patches containing only pixels with value 1 or 0, respectively).
On the other hand, the structural correctness criteria often prevent the merging of small
regions enclosed within patches or around crosspoints and edges. These criteria are useful
22
5.2. Segmentation based on Gradient Magnitude
Figure 5.1: Additional node inserted in the graph representation by the SCIS algorithm
at a saddle face
to preserve texture in highly textured images. Within the homogeneous patches of a
checkerboard, however, it over-emphasizes noise-related differences between individual
pixels.
Therefore, the SCIS algorithm is not suited to create a 4-connected RAG representing
the checkerboard structure.
5.2 Segmentation based on Gradient Magnitude
The aim of this chapter is to create a checkerboard segmentation in which each segmented
region corresponds to a patch. The limits of these regions are formed by the checkerboard
lines.
Intuitively, detecting these lines can be a first step, in order to separate the individual
patches. In a checkerboard, we observe that the image gradient has high values along
these lines as they mark the transition from white to black patches and vice versa.
Thus, the gradient magnitude can be used to identify the separating checkerboard
lines1: For a grayscale image I, the gradients ∇xI and ∇yI in x and y direction can be
1MATLAB example on Marker-Controlled Watershed Segmentation:
https://de.mathworks.com/help/images/examples/marker-controlled-watershed-
segmentation.html?prodcode=IP&language=en [accessed January 2018]
23
5. Checkerboard Segmentation
determined by applying respective Sobel filters:
∇yI =

 1 2 1
0 0 0
−1 −2 −1

 ∗ I (5.4)
∇xI =

1 0 −1
2 0 −2
1 0 −1

 ∗ I, (5.5)
where ∗ denotes the convolution operator. Then, the gradient magnitude is given by
‖∇I‖ =
√
∇xI2 +∇yI2 (5.6)
By thresholding ‖∇I‖, a binary image IBW can be created, in which pixels corresponding
to checkerboard edges are set to 1 and others to 0. Morphological operations can be
performed to adjust the thickness and topology of these edges in IBW .
This way, the checkerboard structure is outlined in a binary image. Consequently,
IBW can be helpful during the segmentation process, in order to separate individual
checkerboard patches.
5.3 Connected Component Labelling of Binary Images
In a binary image, segmentation can be performed by finding and labelling connected
components2 as described in Algorithm 5.1:
In the method labelConnectedComponent, all pixels belonging to the connected compo-
Algorithm 5.1: Find Connected Components
Input: A binary image I
Output: Labelled connected components CC in I
1 l := 0;
2 while ∃ unlabelled pixel p do
3 Find next unlabelled pixel p;
4 l += 1;
5 ccl := labelConnectedComponent(p, l);
6 Add ccl to CC;
7 end
8 return CC;
2MATLAB algorithm for finding connected components in binary image:
https://de.mathworks.com/help/images/ref/bwconncomp.html?s_tid=doc_ta [accessed January
2018]
24
5.4. Refined Segmentation based on Intensity Values
nent containing the current pixel p are labelled using a flood fill algorithm [Pur14]. The
connected component is obtained based on a given connectivity which can be 4 or 8 in a
2D image.
If the patches in a checkerboard image have already been separated in a preprocessing
step, this method can be useful to identify all the pixels belonging to the same patch and
to access them using the respective label.
5.4 Refined Segmentation based on Intensity Values
For connected component labelling as described in the previous section, a binary image
containing separated region objects is required. In this mask, the limits of each region
are encoded. If no such explicit information is available, segmentation can be performed
using the implicit information provided by the intensity values in an image.
Kroon proposes a region growing algorithm3 that uses pixel intensities to define a
similarity measure. Based on this measure, new pixels are added to a segmented region.
For a segmented region R in a grayscale image I, the similarity distance dgrow between
an unsegmented pixel pnew and R is defined as:
dgrow(pnew, R) =
∣∣∣I(pnew)− Ī(R)
∣∣∣ (5.7)
Ī(R) = 1
|R|
∑
p∈R
I(p) (5.8)
Initially, a given seed pixel p is added to R. Subsequently, dgrow is calculated for all
neighbors of p. The most similar neighbor p′ corresponding to the smallest value of dgrow
is selected to be added to R next. In the next iteration, all neighbors of p′ are examined.
At this point, only unsegmented pixels within the image boundary are considered. Region
growing stops if distance threshold τgrow has been exceeded.
This similarity approach is useful to segment the homogeneous region of a checker-
board patch. Without explicit information about the region limits, it is segmented until
the distance between intensity values becomes to large. The treshold parameter τgrow
can be used to control the expansion accross desired limits. Preprocessing is needed to
provide suitable seed points.
Alternatively, Kmeans clustering [AV07] can also be used to segment images based
on intensity values. With this method, each image point is represented by a vector in
feature space. These vectors are separated into clusters based on a distance metric (e.g.
the Euclidean metric). It requires the number of clusters beforehand which would also
involve preprocessing as would providing seed points. The feature space can be chosen
3Dirk-Jan Kroon. MATLAB code published online: https://de.mathworks.com/matlabcentral/fileexchange/19084-
region-growing [accessed January 2018]
25
5. Checkerboard Segmentation
accordingly. One possibility is to use 3D vectors containing the RGB values of each
pixel. Alternatively, the spatial position can also be included by adding the 2D pixel
coordinates, yielding a 5D feature vector.
For checkerboard segmentation, however, clustering is not a favourable method, be-
cause with the feature space approach it does not consider the topology of the image
points: While 5D vectors account for the spatial position of a pixel, there is still no
knowledge about local connectivity. Instead, the coordinate information causes problems
when segmenting large patches which tend to be split when 5D vectors are used. Using
only the 3D color information, however, spatial information is completely ignored and
disconnected clusters are created.
Region growing, however, is suitable to segment the homogeneous patches of a checker-
board pattern: Instead of pure coordinate positions, it accounts for the connectivity
between individual pixels by considering the local neighborhood of each newly added
pixel.
26
CHAPTER 6
Graph-Based Checkerboard
Indexing Algorithm
In this chapter, the individual steps of the checkerboard indexing algorithm described in
Algorithm 6.1 are explained in detail.
For a given RGB image of a checkerboard pattern, an indexing of the individual patch
crosspoints is calculated. This indexing contains the subpixel locations of the crosspoints
in image space as well as the corresponding coordinates in the target CS represented by
the pattern structure. In the input image, the pattern can also be incomplete. Figure 6.1
shows the different stages of our method for an example input image.
We use the following notation:
• P - the set of crosspoints coordinates. The subscript indicates the coordinate space,
whereas the superscript is used to distinguish different stages within the algorithm.
• p - the coordinates of a single crosspoint. The first subscript indicates the coordinate
space, whereas the second subscript denotes the index of the crosspoint. The
superscript is used to distinguish different stages within the algorithm.
• S - The set of segmneted regions. The subscript indicates the index of a particular
region within the set, whereas the superscript is used to distinguish different stages
within the algorithm.
• R - the set of region adjacency profiles (see Section 6.4). The subscript indicates
the index of a particular profile within the set, whereas the superscript is used to
distinguish different stages within the algorithm.
• O - the orientation of the checkerboard pattern within the image.
27
6. Graph-Based Checkerboard Indexing Algorithm
Figure 6.1: Illustration of our checkerboard identification algorithm using color-coded
region labels: (a) original-size grayscale image with detected crosspoints and ROI, (b)
coarse segmentation on cropped image, (c) refined segmentation (d) assigned target
coordinates.
28
6.1. Crosspoint Detection
Algorithm 6.1: Graph-Based Checkerboard Indexing
Input: An RGB image I of a distorted checkerboard pattern
Output: The subpixel image coordinates PI and corresponding target coordinates
PT of the detected crosspoints in the checkerboard image I
1 PD
I := detectCrosspoints( I) ; // see Section 6.1
2 M := getROI(PD
I ) ; // see Section 6.2
3 Crop I and transform PD
I to M ;
4 SC := coarseSegmentation( I) ; // see Section 6.3
5 forall SC
k ∈ SC do
6 Compute center ck;
7 SR
k := refineSegmentation( I, ck) ; // see Section 6.3
8 end
9 S := SC ∪ SR;
10 forall pD
I,j ∈ PD
I do
11 RD
j := getRegionAdjacencyProfile(pD
I,j, S) ; // see Section 6.4
12 end
13 P I
I , R
I := identifyCrosspoints(PD
I ,RD) ; // see Section 6.4
14 pI
I,j0
, O := getOriginAndOrientation( I, S, RI) ; // see Section 6.5
15 correctOrientation(RI , O) ; // see Section 6.6
16 expansionDirections := {up, down, right, left};
17 Initialize P 0
T ;
18 Add P I
I,j0
to P 0
T ;
19 forall e ∈ expansionDirections do
20 P e
T := expandGraph(pI
I,j0
, RI , P e−1
T ) ; // see Section 6.7
21 end
22 return PI , PT ;
6.1 Crosspoint Detection
detectCrosspoints
A core part of our indexing algorithm is the detection of the checkerboard crosspoints (see
Figure 6.2). After discussing possible approaches in Chapter 4, we choose the localization
approach proposed by Geiger et al. [GMCS12] described in Section 4.3. It robustly
provides the locations PD
I of the individual crosspoints with subpixel precision. We omit
the recovery step of the original method which includes a local linearity condition and is
only able to detect full rectangular patterns (i.e. with a constant number of grid points
per row and column). This way, also partial views of checkerboards and more general
distortions are allowed by our algorithm.
By detecting the crosspoint locations at the beginning of the algorithm, we gain infor-
mation about the pattern location within the image which is used by subsequent methods.
29
6. Graph-Based Checkerboard Indexing Algorithm
Figure 6.2: Example crosspoint detection.
The complexity of the crosspoint detection is O(N), where N is the number of pix-
els in the input image, as the corner likelihood and gradient statistics are calculated for
each pixel.
6.2 Pattern Detection
getROI
At this point, we need to determine a mask M for a region of interest (ROI) in which
only the pattern is visible. This step is necessary for the subsequent segmentation. We
can make use of the previously detected crosspoint locations PD
I : M is calculated by
shrinking the convex hull of the localized crosspoints. This way, the ROI only contains
the interior part of the pattern. Parts of patches close to the pattern boundary may get
eliminated without affecting the functionality of our algorithm.
For further processing, a cropped image is used together with M : By eliminating
the image parts which are not relevant to the task, computational cost can be reduced.
6.3 Distorted Checkerboard Segmentation
Image segmentation is used to identify the individual patches in the distorted pattern
as described in Chapter 5 (see Figure 6.3). In this part of the algorithm, each pixel
in the input image is assigned a region label where each region corresponds to a patch
in the pattern. At this point, an approximate segmentation is sufficient and remaining
pixels are assigned to the background. Additionally, center points are calculated for each
segmented region.
coarseSegmentation
Our aim is to separate the individual patches into individual regions. For this purpose,
30
6.3. Distorted Checkerboard Segmentation
Figure 6.3: Example segmentation: Each dot represents a segmented region.
a coarse segmentation is sufficient. We can use the gradient magnitude of the input
image to expose the pattern structure as described in Section 5.2. The edge structure
can be exposed by thresholding. After binarization, pixels outside the ROI are set to
1 to limit the segmentation to the pattern. The current binary image does not show a
smooth structure yet: Holes appear where the gradient magnitude is low at the center
of a blurred crosspoint. Also, thresholding can cause lines to be disconnected. Such
problems can be solved by applying morphological operations. We use bridging and
closing to guarantee that gaps between curve elements are closed, in order to prevent the
merging of neighboring regions.
In the current binary image, the foreground corresponds to the edges. In order to mark
the patches as foreground instead, we use the complement of the image, i.e. pixel values
of 1 and 0 are switched. This way, we created a binary image with separated objects for
each patch. Erosion is performed to reduce fine detail at the borders of the components.
This way, the overall rough structure of the pattern is captured, neglecting details close
to the border between patches which can be prone to noise.
Now, we can identify the corresponding pixels by finding connected components as
described in 5.3, providing a coarse segmentation SC .
refineSegmentation
Now that we have separated the individual patches in SC , these regions are refined. Our
goal is to assign the pixels close to the checkerboard edges that have been neglected in
the previous step to the corresponding patches. We choose to perform region growing is
performed to subsequently approximate each patch more accurately as described in Sec-
tion 5.4. This technique requires a seed point for each region. With the coarse segments
SC already provided, we can make use of the included pixel information. Among all the
pixels belonging to a coarse segment, we assume that the center pixel is best suited as a
seed point for subsequent region growing: It is located in the homogeneous part of the
patch, with the largest distance to all four edges. This is a favourable property, because
this way, the region can grow within the homogeneous part in all directions. Thus, the
31
6. Graph-Based Checkerboard Indexing Algorithm
final region will also be placed centrally within the edge. With a seed point closer to a
patch edge, pixels within the transition part near the edge may be added at an earlier
stage, with a larger impact on the mean value. This way, other pixels from the transition
area or other patches could be added falsely. Alternatively, the growing loop could be
terminated at an early stage due to a large intensity distance. This way, the final grown
region would not be equally close to its neighboring regions.
To prevent a region from growing outside the ROI, adjust the original algorithm1 to use
the mask M during the expansion: Neighbors outside the ROI will not be considered.
We perform region growing for each coarse segment subsequently. This way, a connected
grown region SR
k can become disconnected if some pixels p ∈ SR
k will get assigned to a
different region SR
k′ at a later iteration, k′ > k. To guarantee that final regions remain
connected, we iterate through all segments provided by the growing procedure and
examine their connectivity. For disconnected components, only the component with the
largest area will be considered as the final grown region.
The final labelling l of each pixel p in the input image I is constituted by the union of
the labellings from the coarse and the refined segmentation SC , SR, respectively:
l(p) =
{
lk ∃k ∈ {1...N}, p ∈ SC
k ∪ SR
k
0 otherwise,
(6.1)
where N is the number of segmented regions and lk > 0 the label of the kth region.
The complexity of the coarse segmentation is O(N), N denoting the number of pixels in
the cropped image, due to the per-pixel preprocessing and the segmentation where each
pixel is visited at most once. The complexity of the refinement part depends on the image
content: In a worst case scenario, each region grows to cover the entire image, resulting
in a complexity of O(|S| ·N) and |S| ≤ N . In a good scenario, however, |S| << N and
each region only visits the pixels belonging to its patch. As a result, each pixel is visited
once by a single growing iteration, yielding a complexity of O(N).
Compared to the SCIS algorithm [Cer15] which creates a pyramid containing all the
segmentation stages, our algorithm only contains two levels: pixel and region level. For
the checkerboard indexing task, this is sufficient, because when segmenting a homogeneous
patch, the intermediate stages do not contain important additional information. While
for textured images, different amounts of detail in different segmentation stages may be
valueable, details in a homogeneous patch are often noise-related and the order in which
pixels are merged is less meaningful.
1Dirk-Jan Kroon. MATLAB code published online: https://de.mathworks.com/matlabcentral/fileexchange/19084-
region-growing [accessed January 2018]
32
6.4. Crosspoint Identification
6.4 Crosspoint Identification
getRegionAdjacencyProfile
In this part of the algorithm, the region labels and center points from the previous part
are used to identify individual crosspoints, assigning them the labels of the four adjacent
regions.
In order to find the adjacent regions of each crosspoint, a distance transform is performed
to obtain the four regions which are closest to it. The distance dRAP of a crosspoint
p ∈ PD
I to a segmented region S is then calculated as:
dRAP = min
pk∈Sk
d2(p, pk), (6.2)
where d2 is the Euclidean distance. This is motivated by the fact that these four regions
form a unique region profile for each crosspoint which can be used for identification:
Considering the graph-based approach described in Section 3.2, each crosspoint represents
a face in the RAG which is formed by the nodes corresponding to the four adjacent
patches at this crosspoint (see Figure 6.4). There is no other face that is constituted
by the same nodes as there is only one crosspoint in a checkerboard with the same four
adjacent patches. Thus, a unique region adjacancy profile (RAP) can be defined for
each crosspoint. Let G and G′ be the RAG and its dual graph with nodes N and N ′,
respectively. Then the RAP R is defined as:
R(n′) = (Nk, α) (6.3)
Nk = {n1, n2, n3, n4} ⊆ N (6.4)
n′ ∈ N ′
In R, the elements of the subset Nk are sorted thanks to φ : Nk → Nk which assigns each
node n its neighbor within the face n′ in clockwise order:
φ(n1) = n2 (6.5)
φ(n2) = n3 (6.6)
φ(n3) = n4 (6.7)
φ(n4) = n1 (6.8)
At this point, we adapt the orbit concept and notation used by Cerman [Cer15] for his
graph-based image representation.
A unique RAP can be defined using multiple definitions of φ. Cerman suggests clockwise
and counter-clockwise sorting. However, other permutations on the set {1...4} can be
used as well. Clockwise and counter-clockwise sorting are well-suited for graph-based
representations. We choose the former, because it’s simple to calculate for the adjacent
checkerboard regions: We can sort them by sorting their region center points relative to
the crosspoint. For this purpose, we consider the polar coordinates (θ, r) of the relative
positions in image space and sort them based on their θ values: Let ci = (ci,x, ci,x) be the
pixel coordinates of the centroid of the region represented by the node ni and p = (px, py)
33
6. Graph-Based Checkerboard Indexing Algorithm
Figure 6.4: Example RAP: Labelled nodes and edges of the RAG (red dots and dotted
lines) around a crosspoint (green cross). The order of the labels is defined by φ (blue
dashed arrow).
be the pixel coordinates of the crosspoint node n′. Then the sorting of the nodes ni
within the RAP of n′ are determined by:
θ1 < θ2 < θ3 < θ4
θi = arctan
(
ci,y − py
ci,x − px
)
(6.9)
This way, the regions are sorted clockwise starting with the lower right region relative to
the crosspoint. However, any other position could be used as a starting point as well.
Equivalently to Equation 6.3, we can write R in terms of the labels lk of each region
node n.
R(n′) = (Lk, φ) (6.10)
Lk = {lk1 , lk2 , lk3 , lk4} (6.11)
ki ∈ {1...N}, i = {1...4},
where lki
denotes the label of the node ni, i = {1...4}.
identifyCrosspoints
We add an optional procedure which ensures that for each RAP, a single crosspoint is
found: Depending on the crosspoint detection method, it is possible that multiple cross-
points or corners are found which correspond to the same true checkerboard crosspoint.
Consequently, they all share the same RAP which can be used to identify them. Now,
there are multiple locations for the same crosspoint available. A basic way of determining
the location of the final crosspoint is to average the individual locations. We choose this
approach for our algorithm, because we assume that our crosspoint detection is robust
enough and a simple method is sufficient. A possible more advanced approach would be
to use a weighted average, for instance.
34
6.5. Origin and Orientation Detection
Origin from highest Saturation
Figure 6.5: Example origin detection: Detect origin crosspoint between two regions with
highest mean saturation.
The complexity of calculating the RAPs is governed by determining the four closest
regions for each crosspoint. It requires the distance calculation to each region for each
crosspoint and a subsequent sorting. This yields a complexity of O(|P | · (|S|+ log |P |)).
6.5 Origin and Orientation Detection
Another core part of the indexing algorithm is the detection of the origin of the target
space. It is needed to assign each crosspoints its target coordinate relative to this origin.
getOriginAndOrientation
At first, the origin needs to be detected (see Figure 6.5). For this purpose, we want to
identify the colored patches in the pattern image (see Section 3.3). As they have higher
saturation values compared to black and white patches, the RGB image is converted
into the HSV color space. The two color-coded patches are identified by calculating the
average saturation for each segmented region and selecting the regions corresponding to
the two largest values. This way, the origin is detected as the crosspoint between these
two regions. It can be found by considering the crosspoints RAPs. However, depending on
camera and illumination properties, the mean saturation of the green patch is not always
sufficiently high. Consequently, the two detected regions are not always adjacent and
thus, do not always enclose a unique crosspoint. In this case, the best region candidate
among the diagonal neighbors of the red patch, which we assume to be identified robustly,
is selected.
Now, the origin crosspoint is provided, but the orientation of the pattern in the image
still needs to be detected (see Figure 6.6). It is encoded in the pattern by defining the
red patch as the upper left region within the RAP of the origin crosspoint. Thus, the
orientation can be detected by identifying the red patch among the two origin regions.
The colors red and green can be distinguished by their hue values. Thus, the two origin
35
6. Graph-Based Checkerboard Indexing Algorithm
Figure 6.6: Example orientation detection: Detect pattern orientation from RAP position
of region with lower hue value.
regions are sorted according to their average hue values, in order to identify the upper
left and lower right patch: For a red-green-coded pattern as in Figure 3.5, the upper left
patch is defined as the region with a lower average hue value.
The complexity of the color space conversion is O(N). For the calculation of the
mean saturation and hue values per region, we consider that each pixel is assigned to one
region at most. Thus, this part can be bound by O(N) as well. Sorting the saturation
values yields a complexity of O(|S| · log |S|). Analyzing the origin crosspoint calculation,
we consider two different scenarios: If a crosspoint can be found between the two most
saturated regions, the complexity of finding the crosspoint with the right RAP is O(|P |).
However, if no such crosspoint is found, all the adjacent crosspoints of the most saturated
region have to be considered. In a worst case scenario, it is merged with all diagonally
adjacent (black) patches, while all white patches are separate regions. This way, all
crosspoints would be adjacent to this large region. As there wouldn’t be four adjacent
regions around the crosspoint, incorrect RAPs would contain a fourth region adjacent
to another crosspoint. Depending on the geometry of the pattern in the image, this
not-adjacent or one of the two not-diagonally adjacent regions could falsely be selected as
possible origin-enclosing candiate. Considering incorrect RAPs and false origin candiates
for all crosspoints, each of them could provide a possible origin-enclosing region. Sorting
them to determine the highest saturation would yield a complexity of O(|P | · log |P |).
6.6 Orientation Correction
So far, we have considered each crosspoint and its adjacent region independently. In
the next section, we want to determine their topology within the entire dual graph of
the RAG. To get information about the connectivity of the individual crosspoints, their
RAPs can be used. At this point, however, they are sorted according to the region nodes’
positions relative to the respective crosspoint position - within the image. In order to
36
6.7. Target Coordinates Assignment
Figure 6.7: Orientation correction of a RAP: Originally, the first node (blue) in the RAP
corresponds to the lower right region relative to the crosspoint. After correction, the first
node (blue) corresponds to the patch which is the upper left patch in the target pattern.
Around the origin crosspoint, this patch is marked red.
determine the points’ connectivity within the graph, we want to adjust their RAPs so
that their sorting reflects the nodes’ positions within the graph.
correctOrientation
This adjustment can be realized by incorporating the orientation information gained in
the previous section. At this point, we consider that the RAPs are provided in clockwise
order, starting with the lower right adjacent region based on image coordinates. Now,
we can use the information about the upper left and lower right regions around the
origin. We define the upper left as the first region in the RAP within the target CS.
Consequently, the degree of rotation of the pattern in the image can be measured by the
shift of the upper left region’s position within the RAP. This shift can be calculated for
the origin crosspoint where the information about the upper left and lower right regions
is available. Then, orientation-correction can be performed for the other crosspoints
by shifting their region profiles accordingly. Thus, the profiles are sorted according to
the pattern orientation, starting with the upper left region in the target orientation.
Figure 6.7 illustrates the node shift within the RAP during orientation correction.
6.7 Target Coordinates Assignment
In this part of the algorithm, target coordinates are assigned to the crosspoints identified
in the previous step.
expandGraph
In a final step, target coordinates are iteratively assigned to the crosspoints (see Fig-
ure 6.8). Starting from the origin with coordinates (0,0), we follow the crosspoints along
the coordinate axes and increment or decrement the coordinates accordingly. With
37
6. Graph-Based Checkerboard Indexing Algorithm
Figure 6.8: Example graph expansion: Expand graph from detected origin by comparing
RAPs and assign target coordinates.
positive and negative directions for each axis, we leave each crosspoint with assigned
target coordinates in four directions. For each direction, we can find the next crosspoint
in this direction by finding the corresponding region at the corresponding position in its
RAP. For instance, when moving up (i.e. in positive y direction), the current upper left
region (i.e. at position 1 in the profile) has to be the lower left region (i.e. at position 4)
for the next crosspoint. If we find a crosspoint with a region profile that satisfies this
condition, we increment or decrement the target coordinate according to the direction,
assign it to this point and continue with the expansion in all four directions. If we do
not find a suitable crosspoint or if this crosspoint has already been assigned a target
coordinate (coming from a different direction), we stop the expansion from the current
crosspoint in the current direction. This way, the dual graph of the RAG is created where
every node corresponds to a detected crosspoint. Its label corresponds to its unique
target coordinate within target space.
For the complexity analysis of the graph expansion, we consider that for each crosspoint
which is successfully added to the graph, starting with the origin, we expand the graph
in four directions. While this approach may suggest an exponential recursion, we note
that the expansion stops if no crosspoint was added (either because there is no match
or because this point has already been added). Thus, the expansion will be executed at
most |P | times which corresponds to the case that all crosspoints are added to the graph.
In each expansion step, we iterate over all crosspoints to find the suitable next point in
the current direction. Thus, the complexity of the graph expansion is O(|P |2).
38
CHAPTER 7
Applications of Graph-Based
Checkerboard Indexing
In this chapter, we discuss how our graph-based checkerboard indexing algorithm could be
useful for various applications. In particular, we explain how it addresses open problems
in the fields of camera calibration und multi-view processing.
Our checkerboard identification algorithm provides a mapping between image space
and the target space represented by the pattern. The following properties suggest it can
serve as a basis for different image processing applications:
i) For a given image of a checkerboard pattern, the inherent distortion is captured
implicitly without assuming a globally defined explicit distortion model. This way,
the algorithm is also able to handle deviations from ideal distortion models which
occur under real-life conditions. In particular, it is not limitied to a specific kind of
distortion (such as radial destortion).
ii) Due to its unique structures, the target pattern can be used as a global reference
system for multiple views. The respective images can be processed independently,
allowing an efficient parallel workflow.
iii) Our graph-based approach finds the next crosspoint for each direction and outgoing
crosspoint individually. At this point, it does not make any assumptions about
the number of crosspoints. It is not required that the target space has a constant
number of grid points along each isoline. Consequently, our algorithm allows
checkerboards which are only partially visible in the image without selecting a
rectangular subset of grid points. This property can also be useful in a multi-camera
setup where a global pattern is used. In this case, the FOV of a single camera may
contain only parts of the pattern.
39
7. Applications of Graph-Based Checkerboard Indexing
In the following, we explain how specific applications can be described with the formal
problem statement given in Equation 3.2. In particular, we propose how the respective
target spaces can be defined. Above that, we discuss how these applications can profit
from the previously mentioned properties. Common to all proposed applications, the
goal is to learn the geometric distortions produced by the camera(s) and apply this
information to new images.
7.1 Camera Calibration
During common camera calibration, the projection from points in world to image space
by the camera is estimated [Sze10]. This transformation is equal to the mapping fP
defined in Equation 3.7. It includes both the transformation from world to image space
as well as the distortion applied by the lens.
Depending on further applications, only the distortion information may be relevant
for further processing in image space. The inverse distortion information is implicitly
contained in f : It is defined pointwise at each crosspoint (see Equation 3.2) after it
has been reconstructed from our pointwise indexing mapping fT and the checkerboard
function fI using Equation 3.12. Thus, a distortion function fD : PIT
→ PI can be
derived by
fD = f−1 (7.1)
In order to reconstruct the entire projection mapping fP including the transformation
from world to image space, the embedding fW of the checkerboard into world space has
to be known. The respective points can be measured in a calibration procedure. With
this information available, fP can be reconstructed as
fP = (fW ◦ f)−1 (7.2)
In both cases, the target space is best defined as regular grid, in order to show only
the distortions caused by the camera in the image. It can be represented by a planar
rectangular checkerboard.
Due to property i), these steps can be performed for various lens types. In partic-
ular, the algorithm can be applied to fisheye lenses where the projection cannot be
modelled exactly but only approximated due to their large FOV.
7.2 Distortion and Undistortion
More generally, the algorithm can be used to undistort or distort images for different
purposes including artistic effects.
The problem of undistorting images is closely related to estimating its distortion during
camera calibration which is described in the previous section. Undistortion can be
40
7.3. Multi-View Processing
achieved by applying the inverse of the distortion mapping fD defined in Equation 7.1.
Hence, f can be used directly. As for the target space can be defined in the same way as
for camera calibration.
Conversely, for some applications, the goal may be to distort images in a specified
way. At this point, specific distortions can be created by chaining together individual
distortion mappings. This is possible, because the individual mappings between the
different spaces can be decoupled and composed accordingly.
A possible example of a chained distortion is to use a regular checkerboard as tar-
get space and apply a deformation fDW to it in world space: A deformed checkerboard
will be projected to image space. Then, fT will account for both fDW and the distortion
caused by fP . The final mapping f will be the chained distortion mapping
f = (fP ◦ fDW ◦ ιW )−1 (7.3)
Decoupling can be introduced by using different mappings fI , f ′I for the initial checker-
board creation and for composing f , respectively: In the previous example, where fI
creates a regular checkerboard, an arbitrary mapping can be used for f ′I . This way, the
final decoupled mapping f is constructed as
f = f ′I ◦ (fP ◦ fDW ◦ ιW ◦ fI)−1 (7.4)
Composed and decoupled distortion mappings can be used for various applications
including artistic effects. Various lens distortions fP and deformations fDW are possible
due to property i).
7.3 Multi-View Processing
In multi-view applications, additional problems arise apart from distortions. A core issue
is to find correspondances in images from different views. For applications where the
individual images are combined to form an image mosaic, these correspondences are
further used to align the individual images. The aim of the alignment procedure is to
map each input image and to its corresponding subimage within the image mosaic.
Common image alignment methods such as Brown and Lowe’s automatic technique
using SIFT features [BL07] align images centrally by taking several overlapping images
into account. The camera and video content manufacturer Indiecam, however, aims at
creating panoramic images more efficiently by parallelizing image alignment [BHKB16].
They want to realize this for a multi-camera setup with fixed positions between individual
cameras.
For such an application, our indexing method can be used to solve the correspon-
dance problem and align the individual images at the same time. As the relative position
41
7. Applications of Graph-Based Checkerboard Indexing
and orientation between the individual cameras does not change, the alignment mapping
can be found during a calibration procedure.
In this case, the target space is defined by the image mosaic. In a 360 degree setup as
used by Indiecam, a cylindrical checkerboard can be used to cover the combined FOV of
all cameras. It can be created by applying a suitable cylindrical deformation fDW to a
regular checkerboard. Similarly, a spherical checkerboard can be constructed. By placing
the multi-camera setup inside the cylinder, each camera will see a different part of the
checkerboard with overlapping FOVs between neighboring cameras. As each camera
takes an image of the pattern, each will provide an image of a distorted checkerboard.
Correspondances between these and the common target checkerboard can be determined
independently with our indexing method. Thus, alignment can be performed in par-
allel instead of pairwise using the mapping f 3.12 with a regular checkerboard mapping fI .
At this point, we benefit from property ii): The checkerboard can only be used as
a global reference system because it contains a unique origin which can be identified
by multiple cameras independently. In a 360-degree setup, however, the origin will not
be visible in all views. Instead, different sectors of the checkerboard will be visible for
different cameras. While the current calibration patter only contains a single unique
origin, it is possible to extend this design for 360 degree patterns. As we already use
color-coding to mark two distinct patches, more colors can be introduced to mark different
sectors as proposed by Mucciolo [MKS+17]. The number of sectors depends on the FOV
of the individual cameras. At least one unique patch pair should be visible for each
camera. Among those pairs, one ist defined as the origin of the CS, while the others are
assigned respective coordinates.
For a multi-view application, we also benefit from property iii), as each FOV will
only contains parts of the entire checkerboard.
42
CHAPTER 8
Evaluation
The aim of this chapter is to evaluate our indexing method. This includes the definition
of suitable metrics to quantify the results of the different parts of the algorithm, the
creation of a representative data set as well as the analysis of the results. Above that,
general properties of the method which are independent from the considered data set are
Figure 8.1: Different distortions of checkerboard patterns: fisheye lens with (a) planar
and (b) cylindrical pattern and regular lens with (c) planar (d) arbitrarily deformed
pattern
43
8. Evaluation
discussed. In this discussion, possible improvements of the individual steps and future
work regarding possible applications are considered.
8.1 Metrics and Ground Truth
In this section, the metrics used for the quantitative evaluation of our indexing method
are described. During this evaluation, the results of the individual steps of the algorithm
as well as the final indexing are examined. Thus, different metrics are used to measure
the quality of crosspoint detection, checkerboard segmentation and target coordinate
assignment. In each case, the metric relies on a suitable ground truth definition.
Sokolova et al. [SL09] present different metrics for classification tasks. In a binary
classification problem, the data can be divided into positive and negative samples. The
ground truth yields the true class label for each data point. Let Gp and Gn be the sets of
positive and negative samples, respectively, according to the ground truth. Similarly, let
Cp and Cn be the sets of points which are classified as positive and negative, respectively.
With |.| denoting the cardinality of a set, the numbers of true positives (tp), true negatives
(tn), false positives (fp) and false negatives (fn) are defined as follows:
tp = |Gp ∩ Cp| (8.1)
tn = |Gn ∩ Cn| (8.2)
fp = |Gn ∩ Cp| (8.3)
fn = |Gp ∩ Cn| (8.4)
Based on these numbers, the metrics precision and recall can be calculated to evaluate a
classifier:
precision = tp
tp+ fp
(8.5)
recall = tp
tp+ fn
(8.6)
8.1.1 Metrics for Crosspoint Detection
These metrics are also useful when performing object detection in an image which can also
be considered as a classification problem [Kle14]. In this context, precision determines
the correctness of the detections, whereas recall evaluates the detector’s ability to detect
actual objects. Note that neither of them considers the value of tn. It is less suited for
such tasks, because it is difficult to determine the number of all possible non-objects.
Consequently, only precision and recall are used to evaluate the crosspoint detection part
of our algorithm. For clarity, we denote them precisioncp and recallcp.
For each input image, ground truth information is provided as a mask image. At this
point, we define that a crosspoint is required to have four crossing edges with a length
of at least three pixels. This way, crosspoints between partially occluded or out-of-view
patches at the border of the image are excluded. Also, the edges may be blurred and
44
8.1. Metrics and Ground Truth
several pixels thick, but the gradient strength must be sufficient to identify the edge and
the crosspoint location within a local neighborhood with a radius of four pixels. At this
point, sufficiency is defined based on a subjective estimate during manual annotation.
Due to the gradient requirement, crosspoints in shadowed areas such as the border
of the FOV in fisheye images may be excluded. In the mask image, each crosspoint
location is represented by a local neighborhood with a radius of four pixels. A detected
crosspoint location is considered correct, if it lies within such a neighborhood. At this
point, it is assumed that the localization is sufficiently accurate if a crosspoint is detected
correctly and at most one crosspoint is found within each neighborhood. Therefore,
binary classification is sufficient at this point.
8.1.2 Metrics for Checkerboard Segmentation
On the contrary, multiple labels have to be considered for the checkerboard segmentation
part. Sokolova et al. adjust precision and recall for general multi-class classification
tasks. Alternatively, Unnikrishnan et al. [UPH07] propose similarity measures to evaluate
image segmentation algorithms. They account for the fact that manual ground truth
segmentations are often subjective and do not provide a unique solution.
However, the segmentation part of our algorithm differs from traditional segmentation
tasks. Its goal is to determine the four regions surrounding a crosspoint. For this purpose,
it is sufficient for each segmented region to approximate the corresponding checkerboard
patch. Information about the labels is needed sufficiently close to, but not necessarily
along the borders, which are not distinct in real images. Those pixels remain unclassified.
Thus, the ground truth labelling is not only not unique, but can also be undefined for
some pixels. Also, the exact size and shape of a segmented region is not important. If
two segmentations of the same checkerboard patch exist, their difference is not relevant
as long as they do not contain parts of other patches. These properties lead to a different
evaluation objective compared to traditional segmentation tasks.
Consequently, new metrics are derived to measure the results of the two segmentation
steps. Their goal is to capture
• the ability to detect a checkerboard patch
• if several patches are merged within the same segmented region
• if several segmented regions cover the same patch
For this purpose, a suitable definition of the ground truth segmentation has to be es-
tablished: Each patch is represented by a rough region containing only interior pixels.
Pixels for which ambiguous or missing labels are allowed are omitted. At this point, only
patches surrounding a ground truth crosspoint are considered.
Based on these criteria, we adapt the definitions for tp, fp, and fn given in Equa-
tions 8.1, 8.3 and 8.4 as follows. Let Gi, i ∈ I, I = {0...M}, be the segmented regions
45
8. Evaluation
Figure 8.2: Measured subsets for segmentation evaluation: Ground truth regions including
incorrectly segmented regions with indices in Imissed (circle), Isplit (triangle) and Imerged
(square) and segmented regions including incorrect segments with indices in Joutlier
(circle), Jsplitting (triangle) and Jmerging (square).
according to the ground truth and Sj , j ∈ J , J = {0...N}, the segmented regions of the
examined segmentation S. Further, let \ denote the set difference. Then, we define
tpseg := |I\ (Imissed ∪ Imerged ∪ Isplit)| (8.7)
fpseg := |Joutlier ∪ Jmerging ∪ Jsplitting| (8.8)
fnseg := |Imissed ∪ Imerged ∪ Isplit| , (8.9)
where
Imissed := {i ∈ I : @j ∈ J : Gi ∩ Sj 6= ∅} (8.10)
Imerged :=
{
i ∈ I : ∃i′ ∈ I, i 6= i′, j ∈ J : Gi ∩ Sj 6= ∅ ∧Gi′ ∩ Sj 6= ∅
}
(8.11)
Isplit :=
{
i ∈ I : ∃j, j′ ∈ J , j 6= j′ : Gi ∩ Sj 6= ∅ ∧Gi ∩ Sj′ 6= ∅
}
(8.12)
Joutlier := {j ∈ J : @i ∈ I : Gi ∩ Sj 6= ∅} (8.13)
Jmerging :=
{
j ∈ J : ∃i, i′ ∈ I, i 6= i′ : Gi ∩ Sj 6= ∅ ∧Gi′ ∩ Sj 6= ∅
}
(8.14)
Jsplitting :=
{
j ∈ J : ∃j′ ∈ J , j 6= j′, i ∈ I : Gi ∩ Sj 6= ∅ ∧Gi ∩ Sj′ 6= ∅
}
(8.15)
This way, fnseg denotes the number of ground truth regions which do not have an exact
match in the segmentation S, including regions which are not detected at all (with indices
in Imissed), those which are split into more than one segmented region (with indices
in Isplit) and those which are merged together by the same segmented region with at
least one other ground truth region (with indices in Imerged). In the same manner, fpseg
counts the regions in S which do not correctly segment a corresponding ground truth
region. Analoguous to Imerged and Isplit, they include segmented regions covering more
than one (Jmerging) or splitting the same (Jsplitting) ground truth region, but also outlier
regions which do not correspond to any checkerboard patch (with indices in Joutlier).
Finally, tpseg denotes all correctly segmented regions with a one-to-one mapping to the
46
8.1. Metrics and Ground Truth
ground truth.
Using these numbers, the following metrics for the evaluation of the segmentation S can
be derived:
precisionseg = tpseg
tpseg + fpseg
(8.16)
recallseg = tpseg
tpseg + fnseg
(8.17)
The different subsets are illustrated in Figure 8.2.
8.1.3 Metrics For Indexing with Target Coordinates
For the evaluation of the checkerboard indexing, we manually assign the ground truth
target coordinates to each crosspoint defined in the ground truth mask image used in
Section 8.1.1.
We consider the following sets:
• G - the set of ground truth crosspoints defined by the mask image (see Section 8.1.1)
• D - the set of detected crosspoints calculated in Section 6.4
• DA ⊆ D - the set of detected crosspoint with assigned target coordinates
• DC ⊆ DA ∩G - the set of crosspoints with correctly assigned target coordinates
according to the ground truth
This way, we distinguish between the indexing algorithm’s ability to detect a crosspoint
(using DA) and to assign the correct target coordinate (using DC). Consequently, we
adjust the definitions for tp, fp and fn accordingly:
tpass := |G ∩DA| (8.18)
fpass := |DA\G| (8.19)
fnass := |G\DA| (8.20)
Addtionally, we define the number of correct positives (cpass) as
cpass := |DC | ≤ tpass (8.21)
Based on these numbers, we define the metrics for the assignment of target coordinates
as
precisionass = tpass
tpass + fpass
(8.22)
recallass = tpass
tpass + fnass
(8.23)
precisionc,ass = cpass
tpass + fpass
(8.24)
recallc,ass = cpass
tpass + fnass
(8.25)
47
8. Evaluation
While the first two can be used to evaluates the existance of a node in the graph, the last
two are more strict and evaluate their labels (i.e, the assigned coordinates). They can
be adjusted to focus on the expansion of the graph and the assignment of coordinates
without considering the orientation of the pattern. This way, different aspects of the
indexing can be evaluated. Thus, we further define
precisionc̃,ass =
{
precisionass topology and origin correct for DA
precisionc,ass otherwise
(8.26)
recallc̃,ass =
{
recallass topology and origin correct for DA
recallc,ass otherwise
(8.27)
Finally, the following two metrics are introduced to allow the comparison with common
checkerboard indexing methods for traditonal checkerboards that detect neither orienta-
tion nor origin. For this purpose, we can adjust 8.26 and 8.27 further to only measure the
method’s ability to capture the checkerbooard topology, while neglecting both orientation
as well as origin detection.
precisionĉ,ass =
{
precisionass topology correct for DA
precisionc,ass otherwise
(8.28)
recallĉ,ass =
{
recallass topology correct for DA
recallc,ass otherwise
(8.29)
8.2 Discussion
In this section, we use the previously defined metrics to evaluate our indexing method.
For our experiments, we created a data set of 49 images taken with two different
cameras (an indiePOV HDSDI camera with a fisheye lens with a 185◦ FOV and a Zeiss
optics camera). Apart from the distortions caused by the camera lenses, the images
show different kinds of distortion depending on the deformation of the pattern (planar,
cylindrical and arbitrary, see Figure 8.1).
We implemented our indexing algorithm in MATLAB, using a 64-bit Windows machine.
The functionality of our algorithm is controlled by different parameters. To exam-
ine our method’s sensitivity to their values and to find possible dependencies between
them, we perform a parameter analysis. We consider the most important parameters
listed in Table 8.1. For comparison, we also add the option to perform only a coarse seg-
mentation without refinement (indicated by a τ value of 0). These are the most relevant
parameters as they control how the pattern structure is exposed and the degree of patch
approximation. Thus, they have the highest impact on the resulting segmentation and,
consequently, the RAPs for crosspoint identification.
48
8.2. Discussion
Table 8.1: Parameter Analysis
Parameter Description Values
sclose the size of the structural element which is used to close 6, 7, 8
the bridged binary edge image during the coarse segmentation
serode the size of the structural element which is used to erode 2, 3, 4
the closed binary edge image during the coarse segmentation
τ the threshold distance between a new pixel’s intensity 0, 0.07,
and the region mean during refined segmentation 0.1, 0.13
We calculate our metrics for all parameter combinations. The individual metric re-
sults are discussed in detail in the subsequent subsections. Their mean values over all
images are illustrated in Figures 8.3-8.8.
The parameter analysis can also be used to optimize the individual values and to
find a suitable parameter conbination which can be set as default configuration. Yet,
different combinations may be optimal for different metrics. Some parameter values
are good choices for all metrics, whereas in other cases, different values are preferable
according to different metrics. Table 8.2 shows the parameter combinations corresponding
to the best results for each metric. The metrics precision and recall which we use for
the evaluation of the crosspoint detection are not listed, as they are not influenced by
the chosen parameters.
We can see that for all metrics, the best results are achieved without refined segmentation.
Naturally, the region growing metrics are an exception, because a value τ > 0 is needed
to evaluate the refinement step itself.
For the morphological parameters, the results are less distinct. They vary between
different steps of the algorithm, but also between precision and recall. Thus, it is more
difficult to select "best" values which are suitable as default configuration. However, we
decide that the results of the metrics measuring the effectiveness of the entire algorithm
should be emphasized compared to metrics measuring intermediate results. Therefore,
we focus on the six assignment metrics. Among these, we consider that the c, ass and
c̃, ass metrics are the most important as they consider the correctness of the assigned
target coordinates, at least apart from orientation errors. We can see that they share the
same values configuration for serode and τ . As the c, ass metrics are more important, we
choose to select the closing value corresponding to their best results. Eventually, we use
sclose = 7, serode = 3, τ = 0 for our subsequent evaluation. The corresponding results
are displayed in Figure 8.9. The following subsections discuss the results of the final
configuration as well as the parameter analysis.
8.2.1 Crosspoint Detection
The adapted crosspoint detection approach works well on the dataset with a mean
precision and recall of 0.9907 and 0.9903 and respective median values of 1 and 1, with
49
8. Evaluation
Figure 8.3: Parameter analysis for coarse segmentation
Figure 8.4: Parameter analysis for region growing
50
8.2. Discussion
Figure 8.5: Parameter analysis for final segmentation
Figure 8.6: Parameter analysis for target coordinate assignment
51
8. Evaluation
Figure 8.7: Parameter analysis for correct target coordinate assignment
Figure 8.8: Parameter analysis for correct target coordinate assignment without orienta-
tion
52
8.2. Discussion
Table 8.2: Best configuration for each metric
Metric sclose serode τ
precisionC
seg 8 3 -
recallCseg 8 4 -
precisionR
seg 7 3 0.007
recallRseg 6 4 0.007
precisionseg 8 3 0
recallseg 8 4 0
precisionass 7 2 0
recallass 6 4 0
precisionc,ass 7 3 0
recallc,ass 7 3 0
precisionc̃,ass 6 3 0
recallc̃,ass 6 3 0
Table 8.3: Mean and median metric values for sclose = 7, serode = 3 and τ = 0
Metric Mean Median
precisioncp 0.9907 1
recallcp 0.9903 1
precisionseg 0.9845 0.9895
recallseg 0.9553 0.9727
precisionass 0.9959 1
recallass 0.9029 0.9546
precisionc,ass 0.4786 0.0128
recallc,ass 0.4109 0.0115
precisionc̃,ass 0.6198 0.9865
recallc̃,ass 0.5489 0.7342
individual values depicted in Figure 8.9.
False negatives occur close to the border of the image or FOV (in case of a fisheye
lens) where large parts of adjacent regions are occluded and edges vanish. As expected,
it is more likely to allow outliers compared to the original method [GMCS12], with the
benefit of allowing a greater variety of distortions and also incomplete checkerboards (see
Figure 8.10): Geiger et al. achieve a mean precision and recall of 0.9968 and 0.8738 and
median values of 1 and 0.9091.
8.2.2 Segmentation
Both segmentation steps depend on a correct crosspoint localization, as outlier crosspoints
lead to false positive regions outside the ROI and merged regions close to the border of
53
8. Evaluation
Figure 8.9: Metric values for final parameter configuration
the pattern. Apart from a correct pattern ROI, the coarse segmentation is also influenced
by illumination. However, in a controlled calibration environment appropriate conditions
can be provided.
While region growing is sometimes able to remove false positives from the coarse segmen-
tation, it also tends to merge correctly segmented regions. Figures 8.4 and 8.5 clearly
show that precision and recall decrease with increasing τ . Altogether, the disadvantage
is larger than the benefit and better results can be achieved without region growing.
This shows that our indexing algorithm indeed only requires an approximate segmentation.
Figures 8.3 and 8.5 illustrate that precision and recall can be increased by increas-
ing the structural element for closing. This can be expected as false positives due to
holes in the edge image are removed and gaps are closed which avoids merging with
neighboring regions. Above that, it can be noticed how increased erosion yields larger
precision values, while recall benefits from decreasing it. This tendency accounts for the
fact that with stronger erosion, false positive but also small true positive regions are
more likely to be removed.
The mean and median precision and recall values for sclose = 7 and serode = 3 show that
segmenting false positive regions is a smaller issue than missing or merging regions. Also,
higher median values show that the mean values are strongly influenced by individual
image results. This can be verified by regarding the individual metric values depicted in
Figure 8.9.
54
8.2. Discussion
Figure 8.10: Comparing our adapted crosspoint detection method (∗) to the original (◦):
Improved detection for (a) incomplete checkerboards and (b) arbitrary distortions, at
the expense of (d) outliers due to background features in the calibration environment.
Both fail to detect (c) crosspoints close to borders or occluders.
8.2.3 Crosspoint Identification
Generally, RAPs are well suited to describe a crosspoint. However, they rely strongly
on a correct segmentation. We assumed that an approximate segmentation of a patch
is sufficient. A RAP of a crosspoint p is formed by the four regions Sk, k = 1...4, with
minimal values of dRAP . Thus, lowering these values by a more accurate approximation
of the patch (i.e, a larger segmentation including more pixels belonging to the patch,
with narrower gaps between the individual segments) does not change the RAP, unless
it allows dRAP to fall below the value of at least one of the four closest regions for at
least one region which was ranked fifth or higher. The maximum distance which still
yields a correct RAP is determined by the geometry of the distorted pattern: It has to
be less than the distance to further regions adjacent to neighboring crosspoints. If a
coarse segmentation can guarantee this distance for each crosspoint, refinement is not
only not needed for a correct crosspoint identification, but also does not provide any
additional information for the RAP. On the contrary, incorrectly merged regions around
55
8. Evaluation
(-5,-4)
(-5,-3)
(-4,-4)
(-4,-3)
(-5,-2)
(-4,-2)
(-5,-1)
(-4,-1)
(-3,-4)
(-5,0)
(-3,-3)
(-4,0)
(-3,-2)
(-5,1) (-4,1)
(-3,-1)
(-5,2)
(-2,-4)
(-2,-3)
(-4,2)
(-3,0)
(-2,-2)
(-5,3)
(-2,-1)
(-4,3)
(-3,1)
(-5,4)
(-1,-4)
(-2,0)
(-1,-3)
(-3,2)
(-4,4)
(-1,-2)
(-1,-1)
(-2,1)
(-3,3)
(-1,0)
(-2,2)
(-3,4)
(0,-4)
(0,-3)
(0,-2)
(-1,1)
(0,-1)
(-2,3)
(0,0)
(-1,2)
(-2,4)
(0,1)
(-1,3)
(1,-3)
(1,-2)
(1,-4)
(1,-1)
(0,2)
(1,0)
(-1,4)
(1,1)
(0,3)
(1,2)
(0,4)
(2,-2)
(2,-1)
(2,-3)
(2,0)
(1,3)
(2,-4)
(2,1)
(2,2)
(1,4)
(2,3)
(3,-1)
(3,0)
(3,-2)
(3,1)
(3,-3)
(2,4)
(3,2)
(3,-4)
(3,3)
(3,4)
(4,0)
(4,1)
(4,-1)
(4,2)
(4,-2)
(4,3)
(4,-3)
(4,4)
(4,-4)
Figure 8.11: Correct origin detection and graph expansion, but failed orientation correc-
tion due to high hue value of the red patch.
a crosspoint will distort the RAP and lead to incorrect results for at least the crosspoint
in question.
In case multiple detected crosspoints have the same RAP, the location of the final
crosspoint is currently obtained by averaging their pixel coordinates. This approach
is useful for other crosspoint detectors which localize corners rather than crosspoints
such as the Harris corner detector [HS88]. However, our detector based on Geiger et
al. [GMCS12] already localizes each crosspoint with high accuracy. With this method,
multiple crosspoints with the same RAP rather suggest an incorrect segmentation and/or
a false positive crosspoint. Hence, their average location is not likely to correspond
to a true crosspoint location. Under these circumstances, taking any of the multiple
candidates will probably already improve the chances of correct indexing.
8.2.4 Color-Coded Origin and Orientation Detection
The origin of the target coordinate system is determined by assuming that the two
segmented regions with the highest mean saturation correspond to the colored patches.
Our evaluation shows that this is only true for 51.02% of the images. For the rest, the
mean saturation of the segmented green patch is not sufficiently high. In these cases,
we assume that the red patch is found correctly and choose the best candidate for the
green patch (i.e. region with the highest mean saturation) among its diagonal neighbor
regions. This way, the origin detection is correct in 73.47% of the cases, but still fails if
other diagonal neighbor regions have a higher mean saturation than the green patch (see
Figure 8.13). To make detection more robust, the color of the second (currently green)
patch should be adjusted accordingly. Also, color calibration can be used for the printing
and imaging of the pattern by the camera. It has to take into account illumination as
well.
The orientation of the target coordinate system is determined by assuming that among
56
8.2. Discussion
(5,0)
(5,-1)
(5,1)
(5,-2)
(5,2)
(5,-3)
(4,0)
(4,-1)
(4,1)
(5,3)
(4,-2)
(4,2)
(5,-4)
(4,-3)
(4,3)
(5,4)
(4,-4)
(4,4)
(3,0)
(3,-1)
(3,1)
(3,-2)
(3,2)
(3,-3)
(3,3)
(3,-4)
(3,4)
(2,0)
(2,-1)
(2,1)
(2,-2)
(2,2)
(2,-3)
(2,3)
(2,-4)
(2,4)
(1,0)
(1,-1)
(1,1)
(1,2)
(1,-2)
(1,-3)
(1,3)
(1,4)
(1,-4)
(0,4)
(0,0)
(0,1)
(0,-1)
(0,2)
(0,-2)
(0,3)
(0,-3)
(0,-4)
(-1,0)
(-1,-1)
(-1,1)
(-1,2)
(-1,-2)
(-1,3)
(-1,-3)
(-1,4)
(-1,-4)
(-2,0)
(-2,1)
(-2,-1)
(-2,2)
(-2,3)
(-2,-2)
(-2,4)
(-2,-3)
(-2,-4)
(-3,1)
(-3,0)
(-3,2)
(-3,-1)
(-3,3)
(-3,-2)
(-3,4)
(-3,-3)
(-3,-4)
(-4,1)
(-4,0)
(-4,2)
(-4,3)
(-4,-1)
(-4,-2)
(-4,4)
(-4,-3)
(-4,-4)
Figure 8.12: Correct target coordinate assignment with cylindrical deformation and
fisheye distortion.
Figure 8.13: Orientation is sometimes detected incorrectly at the origin due to a not
sufficiently large mean saturation of the green patch and corrected incorrectly due to
locally varying rotation angles (a) original-size rgb image, (b) incomplete assignment of
target coordinates (NaN - short for Not a Number - indicates a missing target coordinate).
the two regions marking the origin, the one corresponding to the red patch has the lower
mean hue value. However, this assumption does not consider the cyclic hue spectrum.
Depending on camera settings and illumination, the color of the red patch can vary
between red and pink, corresponding to very low or very high hue values. Therefore,
a more precise orientation detection can be performed by adding thresholds to the <
comparison. An example of a failed orientation detection is shown in image 8.11.
57
8. Evaluation
Figure 8.14: Results of different assignment metrics
8.2.5 Orientation Correction and Assignment of Target Coordinates
The quality of the coordinate assignment strongly depends on the considered metric as
illustrated in Figures 8.9 and 8.14. Similar to crosspoint detection and segmentation,
precision is also higher than recall for the coordinate assignment. The assignment alone,
i.e. the addition of a crosspoint to the graph, achieves very good results. However,
compared to all other metrics, precision and recall for correct assignment achieve very low
values. While for the remaining metrics, the median values are always higher, it is much
lower than the mean value for precisionc,ass and recallc,ass. This is due to the fact that
for many files, 0 or 1 crosspoint among all true positives are assigned correctly. This can
be explained by the fact that in case the orientation detection fails, all crosspoints except
for the origin will be assigned incorrect target coordinates. If the origin detection fails as
well, no crosspoint will be labelled correctly. For these cases, the last two assignment
metrics which do not consider the orientation are more expressive: After all, for the
evaluation of our graph-based approach, it is important to distinguish cases where the
indexing fails completely from those, where the orientation is incorrect, but the topology
of the checkerboard is captured correctly (see Figure 8.11). Figure 8.14 shows that the
latter case is true for some of the images with very low values of precisionc,ass and
recallc,ass (about 14% of all images). The reasons for the failed orientation detection
have been discussed in the previous subsection.
Still, the values of precisionc̃,ass and recallc̃,ass are lower than the respective cross-
point detection and segmentation metrics. On the one hand, errors are caused by false
positive crosspoints and/or segmented regions which yield incorrect RAPs. They can be
58
8.3. Applications and Future Work
reduced by stricter calibration conditions where no environment features are visible in
the background.
On the other hand, the current approach for the graph expansion is not suited for some
distortions: The current orientation correction is performed globally. This scheme relies
on a constant global rotation of the pattern. Consequently, our approach works well with
images with similar angles of rotation in all parts of the pattern. Under this condition,
different kinds of distortion and deformation can be handled (see Figure 8.12). In cases
where the distortion leads to different angles in different parts of the image, the assignment
of target coordinates fails, because region profiles are not matched correctly due to a
different position of the upper left region within the profile vector (see Figure 8.13). This
limitation can be overcome by only considering the order of the regions within the vector
without assuming a fixed starting point.
For comparison with the indexing provided by Geiger et al. [GMCS12], we consider
precisionĉ,ass and recallĉ,ass as they do not detect the color-coded origin and orientation.
As for the crosspoint detection, our method performs better with partial and arbitrarily
deformed checkerboards, leading to higher metric scores (0.7216 vs 0.6528 for mean
precisionĉ,ass and 0.6456 vs 0.5889 for mean recallĉ,ass). Yet, fisheye distortion is well-
handled by Geiger et al. Their orientation detection also works robustly, even though it
is limited to complete unoccluded rectangular patterns.
8.3 Applications and Future Work
In this section, we discuss the suitability of our indexing algorithm for the applications
suggested in Chapter 7.
At this point, the evaluation results presented in the previous section need to be con-
sidered. They reveal that the current indexing approach generally supports different
kinds of distortions and deformations, so property i) presented in Chapter 7 is satisfied.
For particular distortions such as strong fisheye distortion, however, support is currently
limited due the constant orientation assumption. We also showed that the designed
calibration pattern allows the automatic detection of origin and orientation as stated
in property ii). At this point, detection only works under limited conditions and is not
robust against certain camera and illumination variations. However, we aim at applying
our method in a controlled environment of a calibration procedure where these restrictions
can be considered. Regarding the current limitations of both properties i) and ii), the
alterations suggested in the previous section can be implemented easily to improve the
technique. As for property iii), partial checkerboards patterns are well-handled by our
method.
Assuming a correct graph-based indexing, the target coordinates are provided at
the checkerboard crosspoints. For the proposed applications such as image alignment,
however, the mapping needs to be defined for each pixel in the image. So far, it is only
59
8. Evaluation
(-1,-1)
(-1,0)
(-1,1)
(0,-1)
(0,0)
(0,1)
(1,-1)
(1,0)
(1,1)
Figure 8.15: Target coordinates provide mapping only at crosspoint resolution.
provided at a low resolution defined by the pattern resolution (see Figure 8.15). A basic
mapping for the remaining image points can be derived by interpolation. However, this
approach is prone to errors if the pattern resolution is too coarse to capture the image
content. Alternatively, as proposed by Banaeyan et al. [BHKB16], the accuracy can be
increased by repeating the identification procedure for multiple views: As the camera is
shifted relative to the pattern, the mapping to target space will be defined for different
pixel positions in each iteration.
Apart from the resolution of the mapping, future work also includes handling of depth
and parallax. Our aim is to apply the geometric information gained from the indexing
during the calibration procedure to new images. The objects captured in these images
will possibly have a different position relative to the cameras than the pattern object
used for calibration. This can cause these objects to be mapped incorrectly with our
approach. Above that, the problem of parallax which occurs during image alignment has
not been addressed yet.
60
CHAPTER 9
Conclusion
In this thesis, a new technique for checkerboard indexing has been presented. It uses a
novel model-free approach combining graph-based representations and basic connected
component segmentation to identify checkerboard crosspoints using RAPs. These RAPs
can be used to create a dual RAG where each crosspoint represents a node in the graph.
This way, the structure of the pattern is captured. The algorihtm can be applied to
images of a possibly distorted or occluded checkerboard pattern which have been taken
during a calibration procedure. For this purpose a suitable pattern design has been
developped.
The proposed algorithm works well under suitable camera and illuminations condi-
tions and allows different kinds of distortions. However, robustness has to be improved for
origin and orientation detection as well as handling particular distortions with strongly
varying rotation within the pattern such as strong fisheye distortion. Some of the existing
problems can be solved by adjusting the calibration setup. For the others, possible
solutions have been proposed. Compared to a state-of-the-art checkerboard detection and
indexing method, our approach is less robust against outliers and fisheye distortion, but
is better suited to handle partial or occluded patterns and limited arbitrary deformation.
Above that, possible applications of the presented indexing algorithm have been discussed
in this thesis. As a model-free method which is designed for the use in a multi-view
setup, it is an interesting alternative to common approaches for various applications. At
this point, a basic concept is provided. Due to the limitations mentioned above, the
current problems of these application which motivate this approach have not yet been
entirely solved. Apart from open problems of the stand-alone indexing, further research
is necessary to make this technique applicable in the context of camera calibration
and multi-view processing. For these applications in particular, open problems include:
increasing the resolution of the calculated mapping to the image resolution and handling
61
9. Conclusion
different depth values or parallax.
62
List of Figures
3.1 An example checkerboard pattern in target, image and world space: The
red dot marks the origin of the pattern. The green solid arrows represent
the coordinate axes in each space. The mappings between these spaces are
depicted by dotted blue arrows. . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Graph-based representations of a checkerboard: Primal RAG (red dot nodes
and dashed edges) and dual graph (green cross nodes and solid edges) . . 12
3.3 Examples of calibration patterns: (a) Distorted traditional checkerboard
pattern. Image taken from Geiger et al. [GMCS12]. (b) Alternative color-
coded calibration pattern. Image taken from Aleman at al. [FLDC14]. . . 13
3.4 Examples of geometrical reference structures captured by the RAG . . . . 13
3.5 Target CS represented by a checkerboard pattern with a color-coded origin
between the red (upper left) and green (lower right) patch. . . . . . . . . 14
4.1 Saddle point and face in primal and dual graph. The arrows represent pixel
relations and are color coded according to their sstrict values (black for 0,
green for 1). Image taken from Cerman [Cer15]. . . . . . . . . . . . . . . 16
4.2 CB vector (outer green cirle) with characteristics at a crosspoint (red point):
four sign-changing indices (points with red circles), differences between oppo-
site sign-changing indices corresponding to vector half-length (black arrows)
and similar sign-changing indices to smaller CB vector (blue inner circle).
Image taken from Bok et al. [BHK16]. . . . . . . . . . . . . . . . . . . . . 18
4.3 Corner score based on (a) ideal and (b) rotated corner prototypes and (c)
gradient statistics. Image taken from Geiger et al. [GMCS12]. . . . . . . . 19
5.1 Additional node inserted in the graph representation by the SCIS algorithm
at a saddle face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.1 Illustration of our checkerboard identification algorithm using color-coded
region labels: (a) original-size grayscale image with detected crosspoints and
ROI, (b) coarse segmentation on cropped image, (c) refined segmentation (d)
assigned target coordinates. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Example crosspoint detection. . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Example segmentation: Each dot represents a segmented region. . . . . . . 31
63
6.4 Example RAP: Labelled nodes and edges of the RAG (red dots and dotted
lines) around a crosspoint (green cross). The order of the labels is defined by
φ (blue dashed arrow). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.5 Example origin detection: Detect origin crosspoint between two regions with
highest mean saturation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.6 Example orientation detection: Detect pattern orientation from RAP position
of region with lower hue value. . . . . . . . . . . . . . . . . . . . . . . . . 36
6.7 Orientation correction of a RAP: Originally, the first node (blue) in the
RAP corresponds to the lower right region relative to the crosspoint. After
correction, the first node (blue) corresponds to the patch which is the upper
left patch in the target pattern. Around the origin crosspoint, this patch is
marked red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.8 Example graph expansion: Expand graph from detected origin by comparing
RAPs and assign target coordinates. . . . . . . . . . . . . . . . . . . . . . 38
8.1 Different distortions of checkerboard patterns: fisheye lens with (a) planar
and (b) cylindrical pattern and regular lens with (c) planar (d) arbitrarily
deformed pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.2 Measured subsets for segmentation evaluation: Ground truth regions including
incorrectly segmented regions with indices in Imissed (circle), Isplit (triangle)
and Imerged (square) and segmented regions including incorrect segments with
indices in Joutlier (circle), Jsplitting (triangle) and Jmerging (square). . . . 46
8.3 Parameter analysis for coarse segmentation . . . . . . . . . . . . . . . . . 50
8.4 Parameter analysis for region growing . . . . . . . . . . . . . . . . . . . . 50
8.5 Parameter analysis for final segmentation . . . . . . . . . . . . . . . . . . . 51
8.6 Parameter analysis for target coordinate assignment . . . . . . . . . . . . . 51
8.7 Parameter analysis for correct target coordinate assignment . . . . . . . . 52
8.8 Parameter analysis for correct target coordinate assignment without orienta-
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8.9 Metric values for final parameter configuration . . . . . . . . . . . . . . . 54
8.10 Comparing our adapted crosspoint detection method (∗) to the original
(◦): Improved detection for (a) incomplete checkerboards and (b) arbitrary
distortions, at the expense of (d) outliers due to background features in the
calibration environment. Both fail to detect (c) crosspoints close to borders
or occluders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.11 Correct origin detection and graph expansion, but failed orientation correction
due to high hue value of the red patch. . . . . . . . . . . . . . . . . . . . . 56
8.12 Correct target coordinate assignment with cylindrical deformation and fisheye
distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
64
8.13 Orientation is sometimes detected incorrectly at the origin due to a not
sufficiently large mean saturation of the green patch and corrected incorrectly
due to locally varying rotation angles (a) original-size rgb image, (b) incomplete
assignment of target coordinates (NaN - short for Not a Number - indicates a
missing target coordinate). . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.14 Results of different assignment metrics . . . . . . . . . . . . . . . . . . . . 58
8.15 Target coordinates provide mapping only at crosspoint resolution. . . . . 60
65

List of Tables
8.1 Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.2 Best configuration for each metric . . . . . . . . . . . . . . . . . . . . . . 53
8.3 Mean and median metric values for sclose = 7, serode = 3 and τ = 0 . . . . 53
67

List of Algorithms
5.1 Find Connected Components . . . . . . . . . . . . . . . . . . . . . . . . 24
6.1 Graph-Based Checkerboard Indexing . . . . . . . . . . . . . . . . . . . 29
69

Bibliography
[AF05] Steffen Abraham and Wolfgang Förstner. Fish-eye-stereo calibration
and epipolar rectification. ISPRS Journal of Photogrammetry and
Remote Sensing, 59(5):278–288, 2005.
[AV07] David Arthur and Sergei Vassilvitskii. K-means++: the advantages of
careful seeding. In In Proceedings of the 18th Annual ACM-SIAM Sym-
posium on Discrete Algorithms, page 1027–1035. Society for Industrial
and Applied Mathematics Philadelphia, 2007.
[BBV01] C. Bräuer-Burchardt and K. Voss. A new algorithm to correct fish-eye-
and strong wide-angle-lens-distortion from single images. Proceedings
2001 International Conference on Image Processing 2001, Vol.1, pp.225-
228, 2001.
[BHK16] Yunsu Bok, Hyowon Ha, and In So Kweon. Automated checkerboard
detection and indexing using circular boundaries. Pattern Recogn. Lett.,
71(C):66–72, February 2016.
[BHKB16] Majid Banaeyan, Hanna Huber, Walter Kropatsch, and Raphael Barth.
A novel concept for smart camera image stitching. Proceedings of the
21st Computer Vision Winter Workshop, 2016.
[BL03] M. Brown and D. G. Lowe. Recognising panoramas. In Proceedings
of the Ninth IEEE International Conference on Computer Vision -
Volume 2, ICCV ’03, pages 1218–1225, Washington, DC, USA, 2003.
IEEE Computer Society.
[BL07] Matthew Brown and David G. Lowe. Automatic panoramic image
stitching using invariant features. Int. J. Comput. Vision, 74(1):59–73,
August 2007.
[BTVG06] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded
up robust features. In Proceedings of the 9th European Conference on
Computer Vision - Volume Part I, ECCV’06, pages 404–417, Berlin,
Heidelberg, 2006. Springer-Verlag.
71
[Cer15] Martin Cerman. Structurally correct image segmentation using local
binary patterns and the combinatorial pyramid. Master’s thesis, PRIP,
TU Wien, 2015.
[FB81] Martin A. Fischler and Robert C. Bolles. Random sample consensus:
A paradigm for model fitting with applications to image analysis and
automated cartography. Commun. ACM, 24(6):381–395, 1981.
[FH04] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient graph-
based image segmentation. Int. J. Comput. Vision, 59(2):167–181,
September 2004.
[Fit01] A.W. Fitzgibbon. Simultaneous linear estimation of multiple view
geometry and lens distortion. Proceedings of the 2001 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition 2001,
Vol.1, pp.I-I, 2001.
[FLDC14] Miguel Alemán Flores, Luis Álvarez León, Luis Gómez Déniz, and
Daniel Elías Santana Cedrés. Automatic Lens Distortion Correction
Using One-Parameter Division Models. IPOL Image Processing OnLine
(Special Issue on Lens Distortion Models), 4:327–343, 2014.
[GMCS12] Andreas Geiger, Frank Moosmann, Omer Car, and Bernhard Schuster.
Automatic camera and range sensor calibration using a single shot. In
International Conference on Robotics and Automation (ICRA), pages
3936–3943, St. Paul, USA, May 2012. IEEE.
[HGJD08] Ciarán Hughes, Martin Glavin, Edward Jones, and Patrick Denny.
Review of geometric distortion compensation in fish-eye cameras. In
IET Irish Signals and Systems Conference, 208. (ISSC 2008), pages
162–167, 2008.
[HRM06] G. Heusch, Y. Rodriguez, and S. Marcel. Local binary patterns as
an image preprocessing for face authentication. In 7th International
Conference on Automatic Face and Gesture Recognition 2006, pages
6–14, USA, 2006. IEEE.
[HS88] Chris Harris and Mike Stephens. A combined corner and edge detector.
In In Proc. of the 4th Alvey Vision Conference, volume 15, pages
147–151, 1988.
[JK16] Ines Janusch andWalter G. Kropatsch. Shape classification according to
LBP persistence of critical points. In Discrete Geometry for Computer
Imagery - 19th IAPR International Conference, DGCI 2016, Nantes,
France, April 18-20, 2016. Proceedings, volume 9647 of LNCS, pages
166–177. Springer, 2016.
72
[Kal09] Michael Kaltenbäck. Analysis 1, 2009. TU Wien, Lecture Notes, p. 6.
[KB06] Juho Kannala and Sami S. Brandt. A generic camera model and cali-
bration method for conventional, wide-angle, and fish-eye lenses. IEEE
TRANS. PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
28:1335–1340, 2006.
[KF08] Michal Kedzierski and Anna Fryskowska. Precise method of fisheye
lens calibration. In Proceedings of the ISPRS-Congress, pages 765–768.
International Society for Photogrammetry and Remote Sensing, 2008.
[Kle14] Reinhard Klette. Concise Computer Vision - An Introduction into
Theory and Algorithms. Undergraduate Topics in Computer Science.
Springer, 2014.
[KS04] Yan Ke and Rahul Sukthankar. PCA-SIFT: A more distinctive rep-
resentation for local image descriptors. In Proceedings of the 2004
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, CVPR’04, pages 506–513, Washington, DC, USA, 2004.
IEEE Computer Society.
[LGAGVGAV13] Fernando López-García, Gabriela Andreu-García, José-Miguel Valiente-
Gonzalez, and Vicente Atienza-Vanacloig. Detection of visual defects in
citrus fruits: Multivariate image analysis vs graph image segmentation.
In Richard Wilson, Edwin Hancock, Adrian Bors, and William Smith,
editors, Computer Analysis of Images and Patterns, volume 8047 of
Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics), pages
237–244, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
[LQS+15] Qiaoliang Li, Suwen Qi, Yuanyuan Shen, Dong Ni, Huisheng Zhang,
and Tianfu Wang. Multispectral image alignment with nonlinear scale-
invariant keypoint and enhanced local feature matrix. IEEE Geoscience
and Remore Sensing Letters, 12(7):1551 – 1555, 2015.
[MKS+17] Gianluigi Mucciolo, Barbara Koneczny, Alessia Saggese, Nicole M. Art-
ner, Walter G. Kropatsch, and Mario Vento. Analysis and calibration
of a mirror setup producing mirror-reflected, multi-view videos. In
Proceedings of the 22nd Computer Vision Winter Workshop. Pattern
Recognition and Image Processing Group (PRIP) and PRIP club, 2017.
[MO14] Lars Vidar Magnusson and Roland Olsson. Improving graph-based
image segmentation using automatic programming. In Applications of
Evolutionary Computation, volume 8602 of Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), pages 464–475, Berlin, Heidelberg,
2014. Springer Berlin Heidelberg.
73
[OPH96] Timo Ojala, Matti Pietikäinen, and David Harwood. A compara-
tive study of texture measures with classification based on featured
distributions. Pattern Recognition, 29(1):51–59, January 1996.
[OS14] Ipek Oguz and Milan Sonka. Logismos-b: Layered optimal graph image
segmentation of multiple objects and surfaces for the brain. IEEE
Transactions on Medical Imaging, June 2014, Vol.33(6), pp.1220-1235,
2014.
[PSHZ+15] F. Perazzi, A. Sorkine-Hornung, H. Zimmer, P. Kaufmann, O. Wang,
S. Watson, and M. Gross. Panoramic video from unstructured camera
arrays. In Comput. Graph. Forum, volume 34, pages 57–68. The
Eurographs Association & John Wiley & Sons, Ltd., 2015.
[Pur14] Werner Purgathofer. Introduction to visual computing, 2014. TU
Wien, Lecture Notes, pp. 31-32.
[Sch05] Ellen Schwalbe. Geometric modelling and calibration of fisheye lens
camera systems. In Proceedings of the 2nd Panoramic Photogramme-
try Workshop, Int. Archives of Photogrammetry, Remote Sensing and
Spatial Information, volume 36, pages 5–8. ISPRS, 2005.
[SL09] Marina Sokolova and Guy Lapalme. A systematic analysis of per-
formance measures for classification tasks. In Inf. Process. Manage.,
volume 45, pages 427–437, Tarrytown, NY, USA, July 2009. Pergamon
Press, Inc.
[SS14] P. Srestasathiern and N. Soontranon. A novel camera calibration
method for fish-eye lenses using line features. ISPRS International
Archives of the Photogrammetry, Remote Sensing and Spatial Informa-
tion Sciences, 40(3):327–332, August 2014.
[Sze10] Richard Szeliski. Computer Vision: Algorithms and Applications.
Springer-Verlag New York, Inc., New York, NY, USA, 1st edition,
2010.
[ULH15] Steffen Urban, Jens Leitloff, and Stefan Hinz. Improved wide-angle,
fisheye and omnidirectional camera calibration. {ISPRS} Journal of
Photogrammetry and Remote Sensing, 108:72 – 79, 2015.
[UPH07] Ranjith Unnikrishnan, Caroline Pantofaru, and Martial Hebert. Toward
objective evaluation of image segmentation algorithms. IEEE Trans.
Pattern Anal. Mach. Intell., 29(6):929–944, June 2007.
[WLaZ+15] Zhongli Wang, Hedong Liang, Xian Wu andYipeng Zhao, Baigen Cai,
Chuanqi Tao, Zhiyi Zhang, Yinling Wang, Shanwen Li, Fengtian Huang,
Shuangfu Fu, and Feng Zhang. A practical distortion correcting method
74
from fisheye image to perspective projection image. In Information
and Automation, 2015 IEEE International Conference on, pages 1178 –
1183, 2015.
[YHZ06] Xianghua Ying, Zhanyi Hu, and Hongbin Zha. Fisheye lenses calibration
using straight-line spherical perspective projection constraint. In P. J.
Narayanan, Shree K. Nayar, and Heung-Yeung Shum, editors, Asian
Conference on Computer Vision, volume 3852 of Lecture Notes in
Computer Science, pages 61–70. Springer, 2006.
[ZL14] Fan Zhang and Feng Liu. Parallax-tolerant image stitching. In Proceed-
ings of the 2014 IEEE Conference on Computer Vision and Pattern
Recognition, CVPR ’14, pages 3262–3269, Washington, DC, USA, 2014.
IEEE Computer Society.
75