Graph-Based Checkerboard Indexing and its Applications in Camera Calibration and Multi-View Processing DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieurin im Rahmen des Studiums Visual Computing eingereicht von Huber Hanna, BSc Matrikelnummer 00925230 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: O.Univ.Prof. Dr. Walter G. Kropatsch Mitwirkung: Dipl. Ing. Ines Janusch, BSc Wien, 27. Februar 2018 Huber Hanna Walter G. Kropatsch Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Die approbierte Originalversion dieser Diplom-/ Masterarbeit ist in der Hauptbibliothek der Tech- nischen Universität Wien aufgestellt und zugänglich. http://www.ub.tuwien.ac.at The approved original version of this diploma or master thesis is available at the main library of the Vienna University of Technology. http://www.ub.tuwien.ac.at/eng Graph-Based Checkerboard Indexing and its Applications in Camera Calibration and Multi-View Processing DIPLOMA THESIS submitted in partial fulfillment of the requirements for the degree of Diplom-Ingenieurin in Visual Computing by Huber Hanna, BSc Registration Number 00925230 to the Faculty of Informatics at the TU Wien Advisor: O.Univ.Prof. Dr. Walter G. Kropatsch Assistance: Dipl. Ing. Ines Janusch, BSc Vienna, 27th February, 2018 Huber Hanna Walter G. Kropatsch Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Erklärung zur Verfassung der Arbeit Huber Hanna, BSc hhuber@prip.tuwien.ac.at Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 27. Februar 2018 Huber Hanna v Danksagung Vielen Dank an Indiecam1 für die Bereitstellung einer indiePOV HDSDI Kamera und für die Finanzierung des Technologie-Transfers (ID: 1360780) gemeinsam mit der Wirt- schaftsagentur Wien2. 1www.indiecam.com 2www.wirtschaftsagenturwien.at vii Acknowledgements Many thanks to Indiecam3 for providing an indiePOV HDSDI camera and for funding the technology transfer (ID: 1360780) together with the Vienna Business Agency4. 3www.indiecam.com 4www.viennabusinessagency.at ix Kurzfassung In dieser Arbeit wird eine neue Methode zur Indizierung eines Schachbrettmusters prä- sentiert. Es wird ein graphenbasierter Ansatz gewählt, um die Topologie des Musters zu erfassen. Die Regionen und Kreuzungspunkte werden mittels Bildsegmentierung identifi- ziert und durch einen Graphen repräsentiert. Dabei wird kein explizites mathematisches Modell für die im Bild vorhandene Verzerrung angenommen. Ein speziell entworfenes Kalibrierungsmuster ermöglicht die Identifizierung eines eindeutigen Referenzpunktes im Schachbrettmuster. Dadurch können die Kreuzungspunkte als Gitterpunkte eines Zielkooridnatensystems interpretiert werden. Mögliche Anwedungen im Bereich der Ka- merkalibrierung und Mehrkamera-Verfahren werden diskutiert. xi Abstract In this thesis, a new method for checkerboard indexing is presented. A graph-based approach is used to capture the topology of the checkerboard pattern. The regions and crosspoints of the pattern are identified using image segmentation and represented by a graph. At this point, no explicit mathematical model is assumed which describes the distortion in the checkerboard image. A special calibration pattern design allows the identification of a unique reference point in the pattern. This way, the crosspoints can be interpreted as the grid points of a target coordinate system. Possible applications in the fields of camera calibration and multi-view processing are discussed. xiii Contents Kurzfassung xi Abstract xiii Contents xv 1 Introduction 1 2 Related Work 3 2.1 Checkerboard Indexing and Camera Calibration . . . . . . . . . . . . . 3 2.2 Multi-View Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Graph-Based Representations . . . . . . . . . . . . . . . . . . . . . . . 4 3 Checkerboards as CS Representations 7 3.1 Coordinate Transformation Problem . . . . . . . . . . . . . . . . . . . 7 3.2 Graph-Based Representation . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Calibration Pattern Design . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Crosspoint Detection 15 4.1 LBP Saddles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Circular Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Corner Likelihood Map and Gradient Statistics . . . . . . . . . . . . . 18 5 Checkerboard Segmentation 21 5.1 SCIS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 Segmentation based on Gradient Magnitude . . . . . . . . . . . . . . . 23 5.3 Connected Component Labelling of Binary Images . . . . . . . . . . . 24 5.4 Refined Segmentation based on Intensity Values . . . . . . . . . . . . . 25 6 Graph-Based Checkerboard Indexing Algorithm 27 6.1 Crosspoint Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.2 Pattern Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.3 Distorted Checkerboard Segmentation . . . . . . . . . . . . . . . . . . 30 6.4 Crosspoint Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 33 xv 6.5 Origin and Orientation Detection . . . . . . . . . . . . . . . . . . . . . 35 6.6 Orientation Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.7 Target Coordinates Assignment . . . . . . . . . . . . . . . . . . . . . . 37 7 Applications of Graph-Based Checkerboard Indexing 39 7.1 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 7.2 Distortion and Undistortion . . . . . . . . . . . . . . . . . . . . . . . . 40 7.3 Multi-View Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 8 Evaluation 43 8.1 Metrics and Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . 44 8.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 8.3 Applications and Future Work . . . . . . . . . . . . . . . . . . . . . . 59 9 Conclusion 61 List of Figures 63 List of Tables 67 List of Algorithms 69 Bibliography 71 CHAPTER 1 Introduction Checkerboard patterns play an important part in various camera calibration techniques. Their images reveal the properties of the respective cameras as regular structures in the original pattern are transformed by the camera lenses. This way, the projection from (regular) world to (distorted) image space can be measured [Sze10]. In this thesis, a new graph-based checkerboard indexing method is proposed. It captures the structure of a (possibly distorted) checkerboard pattern by identifying its crosspoints and their topology. The suggested approach has several advantages over existing indexing methods which are useful in various application contexts including camera calibration as well as multi-view processing. The aim of this work is to analyse the proposed method and its applicability in these contexts. Common calibration techniques assume a specific projection model which describes the distortion in the checkerboard pattern. During the calibration procedure, they perform parameter fitting for this model [Sze10]. This way, a technique is limited to par- ticular types of distortion and fails if the actual distortion deviates from the mathematical model. This is particularly relevant in case of fisheye lenses: They cause severe distortions which can only be approximated [KB06]. This work suggests an alternative approach to overcome these limitations: Instead of fitting detected structures to an a priori defined model, the distortion of the pattern is captured implicitly. The checkerboard pattern is representing a target coordinate system (CS) with axes defined by the pattern lines and grid points defined by the crosspoints between the patches. Each crosspoint is given a target coordinate. Two consecutive image segmentation steps and a special pattern design including unique reference structures are used to identify the corresponding target coordinate for each crosspoint. This way, a low-resolution mapping from image to target space is created. As it is defined pointwise for each crosspoint-pair, it is able to capture global as well as non-global transformations. 1 1. Introduction Apart from distortions caused by the camera, multi-view applications such as image stitching also need to consider image transformations between different views. In order to align images accordingly, corresponding points in the respective images have to be found. Common alignment methods perform a pairwise comparison between the individual images for this purpose. To create image mosaics, the individual images also have to be mapped to the final mosaic space which involves further transformation such as image warping. For multi-view setups with a fixed relative position between the different views, efficiency could be increased by directly mapping the individual images to the mosaic in parallel [BHKB16]. This motivates the use of a calibration procedure during which the mapping from the individual image space to the common mosaic space can be determined for each camera independently. After calibration, this mapping can be applied to new images taken by the cameras. In addition to our graph-based indexing method, we want to present a checkerboard pattern design that contains unique reference structures so that it can serve as a global target CS. In this context, we want to provide a low-resolution pointwise mapping with our indexing algorihtm that can be used to calibrate such multi-view setups. Following a basic concept [BHKB16] that already addressed these motivations of a model-free graph-based calibration method for multi-view applications, the contributions of this thesis are: • The design of a calibration pattern: It contains unique reference structures so it can be used as a target CS in which origin and orientation can be detected automatically. • A new graph-based checkerboard indexing method: It uses image segmentation to identify the patches and crosspoints in the checkerboard. Their topology is captured without assuming a mathematical model. For each crosspoint, a mapping from image to target space is calculated. • An evaluation of the proposed method with focus on its application for camera calibration and multi-view processing. This thesis is structured as follows: In Chapter 2, related work in research areas that are tackled in this thesis is discussed. Next, we describe the coordinate transformations we want to address and the usage of checkerboards as CS representations in Chapter 3. In Chapters 4 and 5, we discuss fundamental work that can be used to solve the problems of crosspoint detection and checkerboard segmentation, respectively. Then, our checkerboard indexing algorithm is described in detail in Chapter 6. In Chapter 7, we discuss how it could be useful for camera calibration and multi-view applications. An analysis of the method including a quantitative evaluation of the results is presented in Chapter 8. Finally, we conclude this thesis in Chapter 9. 2 CHAPTER 2 Related Work In this section, previous work in research areas related to this thesis is discussed. It includes different approaches in the fields of camera calibration, multi-view problems and graph-based representations. 2.1 Checkerboard Indexing and Camera Calibration Common camera calibration methods use a parametric model which describes the map- ping from world to image space. In a next step, suitable parameter values are found which define the actual mapping. For fisheye lenses, however, the rules of perspective projection do not apply due to their large field of view (FOV). Hence, mathematical models only predict the behavior of ideal lenses and the true mapping can merely be approximated [KB06]. Different models applicable for fisheye lenses are provided by Schwalbe [Sch05] and Kannala and Brand [KB06] who include a distortion term based on polynomial functions. Alternatively, Aleman-Flores et al. [FLDC14] use division models to describe severe distortion, which reduces the number of necessary parameters [Fit01]. Various methods for the determination of the parameter values exist [YHZ06, KF08, SS14, WLaZ+15, ULH15]. They can be divided into techniques which require prior knowledge about the position of feature points in world space [HGJD08] and self-calibrating meth- ods [BBV01, FLDC14]. Abraham and Förstner [AF05] perform calibration and epipolar rectification for stereo processing using fisheye lenses. Geiger et al. [GMCS12] propose a calibration method for automatic camera and range sensor calibration. They localize checkerboard crosspoints using a corner likelihood map. Outliers are omitted and the checkerboard structure is recovered based on an energy function. It includes a local linearity requirement which also allows distorted patterns, but not with arbitrary distortion. They assume a rectangular pattern and determine its orientation by distinguishing between the short and long side. 3 2. Related Work Alternatively, Bok et al. [BHK16] perform checkerboard indexing using circular bound- aries. They investigate the vector of image values sorted clockwise at given radii around candidate pixels. Based on their differences to the mean of the largest and smallest value, crosspoint candidates are established. The checkerboard indexing is performed with a technique similar to the RANSAC algorithm [FB81], assuming similar displacements between crosspoints. Banaeyan et al. [BHKB16] present a concept for segmentation-based checkerboard identi- fication based on the Structurally Correct Image Segmentation (SCIS) algorithm [Cer15]. Mucciolo et al. [MKS+17] use color-coded checkerboard patterns for calibration in a multi-view video setup. 2.2 Multi-View Problems Images taken from different views and/or cameras are commonly aligned using feature points. SIFT features [BL03, BL07] and lower dimensional FAST features [KS04] capture gradient changes and their orientation at different scales. Li et al. [LQS+15] increase the amount of matched point pairs for multispectral images with high intensity differences by introducing a non-linear scale space and an adapted feature matrix. SURF features presented by Bay et al. [BTVG06] are based on Haar wavelets. All these features can then be used to find corresponding points in images from different views. A homography transformation which is calculated from matching point pairs can be used to align images automatically [BL07]. However, global homographies do not consider parallax errors which occur when images are taken from different viewpoints. They can be accounted for by including local warping [ZL14, PSHZ+15]. For stereo reconstruction, images of a scene from multiple views are used to recon- struct 3D information. With camera position and parameters available, the images can be rectified so that the correspondance problem is reduced to a 1D search along epipolar lines. [Sze10] All these approaches for solving the correspondance problem require image content that provides image points that can be well-distiguished based on the respective discrimi- native feature. 2.3 Graph-Based Representations Graph-based representations are desirable when topological information is important. Applications include plant phenotyping [JK16], fruit quality control [LGAGVGAV13] or image segmentation. For the latter, pixels are considered as separate regions in the beginning and subsequently merged together. For this purpose, different criteria can be applied. Felsenszwalb and Huttenlocher [FH04] consider the similarity within and between regions. Their method is further improved by Magnusson and Olsson [MO14] using 4 2.3. Graph-Based Representations automatic programming based on evolutionary principles. Alternatively, Cerman [Cer15] preserves the image structure by performing image segmentation based on Local Binary Patterns. Topologically accurate segmentation methods are particularly important in medical applications [OS14]. 5 CHAPTER 3 Checkerboards as CS Representations In this chapter, we formulate the problem we want to address in this thesis from a formal point of view and discuss how it can be solved by representing the involved CSs by checkerboard patterns. 3.1 Coordinate Transformation Problem In this section, a formal problem formulation is presented. Subsequently, the following terms and definitions will be used: • Coordinate spaces are denoted by X with a suitable subscript. Important spaces include: – Image space XI ⊆ R2 – World space XW ⊆ R3 • Subsets of these spaces are denoted by Y with suitable subscripts. • Coordinate mappings between these spaces and subsets are denoted by f with suitable subscripts. • Images are functions which are defined on these subsets. We differentiate between – RGB images IRGB : Y ⊂ XI → [0, 1]3 ⊂ R3 and – Grayscale images IGray : Y ⊂ XI → [0, 1] ⊂ R 7 3. Checkerboards as CS Representations • In particular, we define a checkerboard image CI as CI : YI ⊂ XI → {0, 1} (x, y) 7→ i+ j mod 2 (3.1) ai ≤ x < ai+1 cj ≤ y < cj+1 Y = (aI , bI)× (cI , dI), aI , bI , cI , dI ∈ R ai < ai+1 cj < cj+1 ai ∈ (aI , bI) ⊂ R, i ∈ {1...m} cj ∈ (cI , dI) ⊂ R, i ∈ {1...n}, where ai and cj are defined by the sizes of the patches in the checkerboard. An overview of the subsequently described coordinate spaces and mappings is illustrated in Figure 3.1. Our indexing method is motivated by different image processing applications which have in common that a coordinate transformation has to be determined. In general, this problem can be formulated as follows (for application-specific details see Chapter 7): For a given input image I defined on YI ⊂ XI , there exists a transformed target image I ′ with coordinates in YI′ ⊂ XI . Then, our aim is to find a mapping f : YI → YI′ such that I(x, y) = I ′(f(x, y)) (3.2) for (x, y) ∈ YI . The target image IT is defined by its geometric characteristics (e.g. straight lines in the target image where there are curves in the input image). Based on these character- istics, a target CS XT ⊆ R2 with the desired geometry can be defined. To represent XT in image space XI , a checkerboard pattern CT can be used. Its structure is well-suited to represent the axes of a CS. We already defined a checkerboard image in image space in Equation 3.1. Equivalently, we can define the same checkerboard in target space where each crosspoint in the pattern corresponds to a grid point in XT : CT : YT ⊂ XT → {0, 1} (x, y) 7→ bxc+ byc mod 2 (3.3) YT = (a, b)× (b, c), a, b, c, d ∈ R Depending on the characteristics of XT , CT may have a regular or irregular structure and the horizontal and vertical lines defining the patch edges may not be straight. The crosspoints between the patches represent the grid points of XT in XI . Formally, this 8 3.1. Coordinate Transformation Problem Figure 3.1: An example checkerboard pattern in target, image and world space: The red dot marks the origin of the pattern. The green solid arrows represent the coordinate axes in each space. The mappings between these spaces are depicted by dotted blue arrows. representation can be described as a coordinate mapping fI fI : YT → XI (xT , yT ) 7→ (xI , yI) (3.4) fI(YT ) = YI It assigns each grid point with coordinates in XT its corresponding image coordinates in XI . Intuitively, this mapping is defined by the sizes and shapes of the checkerboard patches which, in turn, are defined by the target geometry. Formally, the checkerboard pattern CT can now be considered as a checkerboard image CI defined in XI . This way, we have constructed a target space with a desired geometry which can be mapped to image space and represented by a checkerboard image. Figure 3.1 shows this representation for a simple example checkerboard. During a calibration procedure, the checkerboard pattern is used as follows: At first, the original 2D image CT is printed and placed into 3D world space XW by a 9 3. Checkerboards as CS Representations mapping fW fW : YI → XW (xI , yI) 7→ (xW , yW ) (3.5) fW (YI) = YW Once transformed into XW , deformations are possibly applied to the pattern, in particular if the checkerboard is no longer planar in XW : fDW : YW → XW (xW , yW ) 7→ (xDW , yDW ) (3.6) fDW (YW ) = YDW Now, the camera takes an image of the pattern. Distortions are applied to the structure by the camera lens as the checkerboard points are projected from 3D XW back to 2D XI by a projection mapping fP fP : YDW → XI (xDW , yDW ) 7→ (xP , yP ) (3.7) fP (YDW ) = YP (3.8) The result is a distorted checkerboard image CD defined on YP . If no deformations are applied, i.e. fDW is the identity function, the original pattern structure is only distorted by the camera lens, and camera properties can be determined by analyzing the distortions in CD. In a checkerboard pattern, the applied distortions are clearly visi- ble [BHKB16]. Otherwise, CD will show the combined distortions caused by fDW and fP . CD is the input of our indexing algorithm. On the one hand, we can describe the mapping f we want to find in Equation 3.2 to transform the input image I into our target image I ′ as the inverse of the mapping that transformed CT into CD: CD(f−1(x, y)) = CI(x, y) (3.9) f = (fP ◦ fDW ◦ fW )−1 , (3.10) where ◦ is the function composition [Kal09]. On the other hand, the locations of the crosspoints in CD can be mapped back to their corresponding grid points in target space by a mapping fT fT : YP → XT (xP , yP ) 7→ (xT , yT ) (3.11) fT (YP ) = Y ′T ⊆ YT 10 3.2. Graph-Based Representation Depending on the FOV of the camera and possible occlusions, Y ′T can be a real subset of YT . Together with the coordinate transformation fI defined in Equation 3.4, we can consider an alternative description of f given by f = fI ◦ fT (3.12) As fI is already given by the checkerboard, it is sufficient to determine fT in order to find the mapping f . This is the aim of our indexing method. 3.2 Graph-Based Representation A graph-based representation of the checkerboard image was proposed by Banaeyan et al. [BHKB16]: Graphs are well-suited to capture the pattern’s topology. In the original pattern image, each pixel corresponds to a node in the graph. The nodes of neighboring pixels are linked with edges. Using image segmentation, the pixels corresponding to a particular patch are merged into a single region. In the graph representation, this corresponds to the contraction of the edges between these nodes. This way, a region adjacency graph (RAG) is created where each patch is represented by a single node. Again, the nodes of neighboring patches are linked with edges. In a checkerboard pattern, the RAG is a 4-connected graph. Alternatively, the pattern structure can be represented by its crosspoints. This way, the graph representation contains a node for each crosspoint and an edge for each edge between two crosspoints in the pattern. The resulting graph corresponds to the dual graph of the (primal) RAG [Cer15]. Both representations are well-suited to encode the topology of the pattern. They are illustrated in Figure 3.2. While the RAG representation is favourable for segmenta- tion, the dual graph representation is chosen for the final indexing where we want to focus on the crosspoints. The RAG can be derived easily from the dual graph if necessary. 3.3 Calibration Pattern Design In this section, we present the calibration pattern design that is used with our graph-based checkerboard indexing method. Our aim is to create a pattern with the following properties: i) It is able to represent the target CS and its axes. ii) The grid points of the CS can be detected in an image of the pattern with high accuracy. 11 3. Checkerboards as CS Representations Figure 3.2: Graph-based representations of a checkerboard: Primal RAG (red dot nodes and dashed edges) and dual graph (green cross nodes and solid edges) iii) It contains unique structures that serve as reference points. They make it possible to automatically identify the origin and orientation of the CS represented by the pattern. Subsequently, individual grid points can be identified based on their relative position to the origin. iv) The properties i)-iii) are still fulfilled if a limited distortion is applied to the pattern. v) The characteristics of the distortion are visible in the projected pattern. Tradititional checkerboards consist of alternating black and white square patches. Fig- ure 3.3 (a) shows an example image of such a checkerboard pattern used by Geiger et al. [GMCS12]. It shows a visible distortion. However, alternative patterns can also be used to estimate distortion properties. A color-coded example used by Aleman at al. [FLDC14] is given in Figure 3.3 (b). Traditional checkerboard patterns already fulfill properties i), ii), iv) and v) [GMCS12]. Without property iii), the individual crosspoints between the patches cannot be dis- tiguished because of their similarity. In order to solve our mapping problem (3.2), corresponding points have to be identified in the respective checkerboard images. How- ever, common approaches for solving the correspondance problem, such as discriminative features or searching along epipolar lines, require distiguishable image points. Hence, they cannot be applied in this case. Instead, property iii) allows the distinction of the individual crosspoints based on their position in the target CS. A possible way to add unique structures to the pattern would be to modify the regular square geometry in parts of the pattern. An example is given in Figure 3.4. Some square patches have been replaced with triangles, yielding a diamond-shaped struc- ture. This modification is motivated by the use of a RAG representation of the pattern structure. Whereas the regular square structure yields a regular 4-connected RAG, a 12 3.3. Calibration Pattern Design Figure 3.3: Examples of calibration patterns: (a) Distorted traditional checkerboard pattern. Image taken from Geiger et al. [GMCS12]. (b) Alternative color-coded calibration pattern. Image taken from Aleman at al. [FLDC14]. Figure 3.4: Examples of geometrical reference structures captured by the RAG different connectivity can be detected at the nodes representing the triangles. Addition- ally, several reference points can be included by varying the size of the diamond shape created by the triangulated structures in the pattern. This can be useful to encode infor- mation about the orientation of the pattern or for multi-view applications (see Chapter 7). Alternatively, we have seen that for RGB input images, the use of color is a suit- able choice to add spatial information to the pattern [MKS+17, FLDC14]. Specific parts of the pattern can be emphasized by color-coding selected patches. At this point, we distinguish between information about the origin and about the orientation of the pattern. The origin can be identified by highlighting two diagonally adjacent patches. The crosspoint between them defines a unique origin. Additionally, information about the orientation can be added by distinguishing between the two color-coded patches. As one of them is defined as the upper left patch relative to the origin crosspoint, the orientation 13 3. Checkerboards as CS Representations Figure 3.5: Target CS represented by a checkerboard pattern with a color-coded origin between the red (upper left) and green (lower right) patch. of the pattern can be determined. At this point, we choose red and green as colors for the upper left and lower right patch, respectively. A similar combination is used by Mucci- olo at al. [MKS+17]. It considers the distances of the involves colors on the unit RGB cube. Figure 3.5 shows an example pattern and the corresponding target CS. The crosspoints and horizontal and vertical lines in the pattern define the grid points and coordinate axes in the CS, respectively. This example pattern represents a regular grid CS. However, other target CS with two coordiante axes such as cylindrical or spherical CS can be used instead depending on the application (more in Chapter 7). Compared to the geometrical approach, the color-coded reference structures maintain the regular square structure of the pattern. The simple 4-connectivity is a favourable prop- erty which we can profit from when we calculate the checkerboard indexing (Chapter 6). Above that, with simple square patches, distortions become more distinct compared to the original pattern. At the same time, the finer details of the geometric approach require a higher image resolution to be captured properly. They also may be more prone to errors due to degenerate triangles caused by strong distortions. With the color-coded patches visible in the pattern image, property iii) is also ful- filled and we have created a calibration pattern with which we can perform automatic checkerboard indexing. The automatic detection of origin and orientation are part of the indexing algorithm and will be described in Chapter 6. 14 CHAPTER 4 Crosspoint Detection A core part of checkerboard indexing is the detection of the crosspoints in the image as they mark the grid points of the represented CS. In real images of checkerboards, blur and distortion can cause indistinct transitions between neighboring patches and thus, also indistinct crosspoint locations. At this point, by considering coordinates with subpixel precision, the exact crosspoints can be located more accurately. In this chapter, different approaches for crosspoint detection are compared. In particular, their suitability for our model-free indexing algorithm is discussed. 4.1 LBP Saddles In a checkerboard, the crosspoints are surrounded by four alternating black and white patches. This structural property can be used to define a suitable detection method. Local Binary Patterns (LBP) [OPH96] describe an image point by the structure of its local neighborhood: Its pixel value is compared to the neighboring values at a given radius. Let p be such a pixel in an image I and p′ a particular neighbor pixel. Then, the relation between p and p′ is described by a 1-bit code. Its value can be determined by a sign function s as s(p, p′) = { 1, if I(p′) ≥ I(p). 0, otherwise. (4.1) The neighbor pixels are sorted in clockwise or counter-clockwise order and their values of s are then combined to define an N-bit LBP code LBP (p) = N∑ i=0 s(p, p′i)2i, (4.2) with N denoting the number of neighbors p′i. This code describes the local structure at p in the image I. 15 4. Crosspoint Detection Figure 4.1: Saddle point and face in primal and dual graph. The arrows represent pixel relations and are color coded according to their sstrict values (black for 0, green for 1). Image taken from Cerman [Cer15]. Cerman [Cer15] uses LBPs to perform image segmentation that preserves structural correctness in an image (see Chapter 5). For this purpose, he calculates the LBP code of each image point by considering the four adjacent pixels in a 4-connected neighborhood. This 4-bit code can be used to identify image points belonging to particular structural classes. In particular, pixels with an LBP code of LBP = 1010 or LBP = 0101 are classified as saddle points. At this point, Cerman encodes the relation between two pixels using a strict inequality sstrict(p, p′) = { 1, if I(p′) > I(p). 0, otherwise. (4.3) In a graph-based representation (see Chapter 3), saddles can be identified both in the primal as well as in the dual graph (see Figure 4.1). In the primal graph, each pixel represents a region. In a 4-connected graph, a face is formed by 2-by-2 pixels. Faces are the elements of the dual graph. To determine the LBP code of a face, sstrict is evaluated for two adjacent pixels, respectively. A saddle face is created by alternating sstrict values between these regions. It can be observed that crosspoints in a checkerboard image are often located at such saddle faces. This motivates the idea of using the LBP class of a face as an indicator for our crosspoint detection task. This could be achieved with the following approach: In the dual graph representation of the input image, we determine the LBP class of each face. If a face is classified as saddle, we assume that it represents a crosspoint. This approach is particularly favourable if the checkerboard indexing is performed using the SCIS algorithm as proposed by Banaeyan et al. [BHKB16]. In this case, the saddle faces are already identified in a preprocessing step preparing image segmentation. Thus, no additional calculations would be required. 16 4.2. Circular Boundaries However, the LBP class of a face in a 4-connected graph is not a reliable feature for crosspoint detection in real images. On the one hand, saddle faces also occur within the homogeneous parts of a checkerboard patch due to noise. If the checkerboard pattern only covers a part of the image, LBP saddles can also occur at the border of the pattern and in the environment. Thus, further processing is needed to distinguish crosspoint saddles from noise and environment saddles. On the other hand, depending on the amount of blur and distortion in the input image, a crosspoint does not always form a 2-by-2 saddle face. In these cases, further processing is needed to detect these crosspoints. 4.2 Circular Boundaries Along with their indexing technique (see Chapter 2), Bok et al. [BHK16] propose an alternative feature for the detection of crosspoints in the image of a possibly distorted checkerboard: They use a vector representing the circular boundary (CB) of an image point to describe its local structure. Similar to the LBP approach, they consider the pixels in a circular neighborhood at a given radius. However, they do not compare their image values to the pixel in the center, but subtract the mean value of the largest and smallest value in the neighborhood. This way, a CB vector is created where the sign of an entry indicates its relation to the mean value. Based on this sign, Bok et al. deduct criteria to identify crosspoint candidates which are depicted in Figure 4.2: i) In the CB vector, the sign changes exactly four times. ii) The difference between the indices corresponding to the opposite sign changes is approximately equal to half the length of the vector. iii) The indices corresponding to the opposite sign changes are approximately equal to those in a CB vector with a smaller radius, but the same center. Among uniformly distributed seed points, initial crosspoint candidates are selected based on characteristic i). After a window-based iterative refinement step using image gradients in x and y direction, more outliers and duplicates are discarded using ii). Outliers at checkerboard edges are specifically discarded by comparing the slopes of the surrounding edges. Finally, remaining outliers are eliminated during the indexing step (see Chapter 2). The CB approach improves the crosspoint detection compared to the LBP saddle ap- proach as it is able to capture the alternating neighborhood structure at a larger scale and is not limited to 2-by-2 saddle points. To avoid the displacement assumptions made during the indexing step, the steps before can be performed for crosspoint detection only, which is the aim of this chapter. However, the discarding criteria are not well-suited to allow arbitrary distortions. While 17 4. Crosspoint Detection Figure 4.2: CB vector (outer green cirle) with characteristics at a crosspoint (red point): four sign-changing indices (points with red circles), differences between opposite sign-changing indices corresponding to vector half-length (black arrows) and similar sign-changing indices to smaller CB vector (blue inner circle). Image taken from Bok et al. [BHK16]. outliers still remain, the slope criterion which aims at eliminating outliers close to edges, also discards crosspoints in strongly tilted checkerboards. 4.3 Corner Likelihood Map and Gradient Statistics Geiger et al. [GMCS12] also distinguish between crosspoint detection and checkerboard recovery. They propose an alternative corner detector which combines gradient statistics with a corner likelyhood map. For two corner prototypes (ideal and rotated by 45 degrees, see Figure 4.3) the responses of four filter kernels as well as the mean filter response are considered to calculate the corner likelyhood map C. Corner candidates are determined by subsequent non-maxima suppression on C. Additionally, gradient statistics are calculated as depicted in Figure 4.3: An orientation histogram with 32 bins is created from the grayscale input image I. The two dominant modes are used to create an expected gradient magnitute template T . The final corner score is calculated as R = (T ? ||∇I||2) · C, (4.4) where ? denotes the normalized cross-correlation (NCC). Thresholding yiels the final corner candidates. For corner candidates with a score above a threshold τ , subpixel locations c are determined by expecting gradients gp at points p in a local neighborhood N to be orthogonal to p− c. The solution of this optimization problem can be calculated in closed form, providing the 18 4.3. Corner Likelihood Map and Gradient Statistics Figure 4.3: Corner score based on (a) ideal and (b) rotated corner prototypes and (c) gradient statistics. Image taken from Geiger et al. [GMCS12]. final corner location c: c = ∑ p∈N gpg T p −1 ∑ p∈N ( gpg T p ) p (4.5) Again, the subsequent recovery step involving a local linearity condition is able to eliminated last remaining outliers. To be more flexible with regard to the geometry of the checkerboard we do not consider this recovery step at this point. Still, combining a likelihood map with gradient statistics already yields a robust detector compared to the previous methods using LBP saddles and CBs. Above that, the localization of subpixel coordinates is very accurate, which is favourable for our task. 19 CHAPTER 5 Checkerboard Segmentation We choose a segmentation-based approach for our indexing method, in order to capture the structural information of the checkerboard pattern at region-level. In this chapter, fundamental work in the field of image segmentation is discussed. In particular, the suitability for checkerboard segmentation is considered for each method. 5.1 SCIS Algorithm The SCIS Algorithm proposed by Cerman [Cer15] uses a graph-based segmentation approach. It creates a graph pyramid providing the image at different segmentation stages along with the segmentation history. At the bottom level of the pyramid, the nodes correspond to pixels and their values to the respective RGB or gray values. At higher levels, the nodes correspond to regions and their values to the mean of the merged pixel values. The SCIS algorithm focuses on the structural correctness of the segmented image. Struc- tural relations in the image are encoded using LBPs (see Chapter 4) and combinatorial maps. Each edge e between two nodes ni, nj in the graph consists of two half-edges called darts di and dj which are incident to the respective nodes. A combinatorial map C is defined as the triplet constituted by the set of darts D, an involution α and a permutation σ: C = (D, α, σ) (5.1) . α is a mapping on D which assigns each dart di its counterpart dj for edges e = (di, dj). σ is also a mapping on D which assigns each dart d′i the subsequent dart d′i (on clockwise or counterclowise order) incident to the same node ni.The sequences created by these mappings are called orbits. A combinatorial map encodes both the primal as well as the 21 5. Checkerboard Segmentation dual graph. For the dual graph, a respective orbit is created by the composition of the mappings in the primal graph: φ = α ◦ σ ( or σ ◦ α) (5.2) These orbits are useful to encode the orientation of nodes around a vertex or within a face. Let vi and vj be the values of ni, nj . Then, each dart is assigned a weight w according to the relation between vi and vj : w(di) = { 1, if vi ≥ vj . 0, otherwise. (5.3) The weights of all darts incident to a node constitute its LBP code. It can be used to categorize nodes by dividing them into LBP classes such as maxima, minima, slopes or saddles. A structurally correct image representation is guaranteed by specific conditions: • To create an image graph in which each face can be classified as a slope, addtional nodes are inserted between nodes that form a saddle. • The definition of the weight function w given in Equation 5.3 yields ambiguous information due to the ≥ relation. To achieve a strict inequality, neighboring nodes with equal values, called plateaus, are merged. • When the pyramid is created, two nodes can only be merged if the LBP code of their neighbors remains the same after the merging operation. • Also, two nodes are only merged if the value of the resulting node is sufficiently different from its neighbor nodes. This LBP approach has different advantages. It is able to preserve the texture in an image during segmentation. Above that, LBP encoding is well-suited to handle illumination changes [HRM06]. The construction of a graph pyramid has the advantage that the resolution of the graph representation can be adjusted continuously. Banaeyan et al. [BHKB16] propose a checkerboard indexing concept which uses the SCIS algorithm for checkerboard segmentation. However, while structurally correct segmentation works well with many input images, problems arise with the specific task of checkerboard segmentation: On the one hand, saddle faces are often present in checkerboard images close to crosspoint locations. Thus, an additional node is added to turn it into a slope face (see Figure 5.1). This way, merging of patches across a crosspoint is encouraged even for ideal checkerboard images (with perfect square patches containing only pixels with value 1 or 0, respectively). On the other hand, the structural correctness criteria often prevent the merging of small regions enclosed within patches or around crosspoints and edges. These criteria are useful 22 5.2. Segmentation based on Gradient Magnitude Figure 5.1: Additional node inserted in the graph representation by the SCIS algorithm at a saddle face to preserve texture in highly textured images. Within the homogeneous patches of a checkerboard, however, it over-emphasizes noise-related differences between individual pixels. Therefore, the SCIS algorithm is not suited to create a 4-connected RAG representing the checkerboard structure. 5.2 Segmentation based on Gradient Magnitude The aim of this chapter is to create a checkerboard segmentation in which each segmented region corresponds to a patch. The limits of these regions are formed by the checkerboard lines. Intuitively, detecting these lines can be a first step, in order to separate the individual patches. In a checkerboard, we observe that the image gradient has high values along these lines as they mark the transition from white to black patches and vice versa. Thus, the gradient magnitude can be used to identify the separating checkerboard lines1: For a grayscale image I, the gradients ∇xI and ∇yI in x and y direction can be 1MATLAB example on Marker-Controlled Watershed Segmentation: https://de.mathworks.com/help/images/examples/marker-controlled-watershed- segmentation.html?prodcode=IP&language=en [accessed January 2018] 23 5. Checkerboard Segmentation determined by applying respective Sobel filters: ∇yI =   1 2 1 0 0 0 −1 −2 −1   ∗ I (5.4) ∇xI =  1 0 −1 2 0 −2 1 0 −1   ∗ I, (5.5) where ∗ denotes the convolution operator. Then, the gradient magnitude is given by ‖∇I‖ = √ ∇xI2 +∇yI2 (5.6) By thresholding ‖∇I‖, a binary image IBW can be created, in which pixels corresponding to checkerboard edges are set to 1 and others to 0. Morphological operations can be performed to adjust the thickness and topology of these edges in IBW . This way, the checkerboard structure is outlined in a binary image. Consequently, IBW can be helpful during the segmentation process, in order to separate individual checkerboard patches. 5.3 Connected Component Labelling of Binary Images In a binary image, segmentation can be performed by finding and labelling connected components2 as described in Algorithm 5.1: In the method labelConnectedComponent, all pixels belonging to the connected compo- Algorithm 5.1: Find Connected Components Input: A binary image I Output: Labelled connected components CC in I 1 l := 0; 2 while ∃ unlabelled pixel p do 3 Find next unlabelled pixel p; 4 l += 1; 5 ccl := labelConnectedComponent(p, l); 6 Add ccl to CC; 7 end 8 return CC; 2MATLAB algorithm for finding connected components in binary image: https://de.mathworks.com/help/images/ref/bwconncomp.html?s_tid=doc_ta [accessed January 2018] 24 5.4. Refined Segmentation based on Intensity Values nent containing the current pixel p are labelled using a flood fill algorithm [Pur14]. The connected component is obtained based on a given connectivity which can be 4 or 8 in a 2D image. If the patches in a checkerboard image have already been separated in a preprocessing step, this method can be useful to identify all the pixels belonging to the same patch and to access them using the respective label. 5.4 Refined Segmentation based on Intensity Values For connected component labelling as described in the previous section, a binary image containing separated region objects is required. In this mask, the limits of each region are encoded. If no such explicit information is available, segmentation can be performed using the implicit information provided by the intensity values in an image. Kroon proposes a region growing algorithm3 that uses pixel intensities to define a similarity measure. Based on this measure, new pixels are added to a segmented region. For a segmented region R in a grayscale image I, the similarity distance dgrow between an unsegmented pixel pnew and R is defined as: dgrow(pnew, R) = ∣∣∣I(pnew)− Ī(R) ∣∣∣ (5.7) Ī(R) = 1 |R| ∑ p∈R I(p) (5.8) Initially, a given seed pixel p is added to R. Subsequently, dgrow is calculated for all neighbors of p. The most similar neighbor p′ corresponding to the smallest value of dgrow is selected to be added to R next. In the next iteration, all neighbors of p′ are examined. At this point, only unsegmented pixels within the image boundary are considered. Region growing stops if distance threshold τgrow has been exceeded. This similarity approach is useful to segment the homogeneous region of a checker- board patch. Without explicit information about the region limits, it is segmented until the distance between intensity values becomes to large. The treshold parameter τgrow can be used to control the expansion accross desired limits. Preprocessing is needed to provide suitable seed points. Alternatively, Kmeans clustering [AV07] can also be used to segment images based on intensity values. With this method, each image point is represented by a vector in feature space. These vectors are separated into clusters based on a distance metric (e.g. the Euclidean metric). It requires the number of clusters beforehand which would also involve preprocessing as would providing seed points. The feature space can be chosen 3Dirk-Jan Kroon. MATLAB code published online: https://de.mathworks.com/matlabcentral/fileexchange/19084- region-growing [accessed January 2018] 25 5. Checkerboard Segmentation accordingly. One possibility is to use 3D vectors containing the RGB values of each pixel. Alternatively, the spatial position can also be included by adding the 2D pixel coordinates, yielding a 5D feature vector. For checkerboard segmentation, however, clustering is not a favourable method, be- cause with the feature space approach it does not consider the topology of the image points: While 5D vectors account for the spatial position of a pixel, there is still no knowledge about local connectivity. Instead, the coordinate information causes problems when segmenting large patches which tend to be split when 5D vectors are used. Using only the 3D color information, however, spatial information is completely ignored and disconnected clusters are created. Region growing, however, is suitable to segment the homogeneous patches of a checker- board pattern: Instead of pure coordinate positions, it accounts for the connectivity between individual pixels by considering the local neighborhood of each newly added pixel. 26 CHAPTER 6 Graph-Based Checkerboard Indexing Algorithm In this chapter, the individual steps of the checkerboard indexing algorithm described in Algorithm 6.1 are explained in detail. For a given RGB image of a checkerboard pattern, an indexing of the individual patch crosspoints is calculated. This indexing contains the subpixel locations of the crosspoints in image space as well as the corresponding coordinates in the target CS represented by the pattern structure. In the input image, the pattern can also be incomplete. Figure 6.1 shows the different stages of our method for an example input image. We use the following notation: • P - the set of crosspoints coordinates. The subscript indicates the coordinate space, whereas the superscript is used to distinguish different stages within the algorithm. • p - the coordinates of a single crosspoint. The first subscript indicates the coordinate space, whereas the second subscript denotes the index of the crosspoint. The superscript is used to distinguish different stages within the algorithm. • S - The set of segmneted regions. The subscript indicates the index of a particular region within the set, whereas the superscript is used to distinguish different stages within the algorithm. • R - the set of region adjacency profiles (see Section 6.4). The subscript indicates the index of a particular profile within the set, whereas the superscript is used to distinguish different stages within the algorithm. • O - the orientation of the checkerboard pattern within the image. 27 6. Graph-Based Checkerboard Indexing Algorithm Figure 6.1: Illustration of our checkerboard identification algorithm using color-coded region labels: (a) original-size grayscale image with detected crosspoints and ROI, (b) coarse segmentation on cropped image, (c) refined segmentation (d) assigned target coordinates. 28 6.1. Crosspoint Detection Algorithm 6.1: Graph-Based Checkerboard Indexing Input: An RGB image I of a distorted checkerboard pattern Output: The subpixel image coordinates PI and corresponding target coordinates PT of the detected crosspoints in the checkerboard image I 1 PD I := detectCrosspoints( I) ; // see Section 6.1 2 M := getROI(PD I ) ; // see Section 6.2 3 Crop I and transform PD I to M ; 4 SC := coarseSegmentation( I) ; // see Section 6.3 5 forall SC k ∈ SC do 6 Compute center ck; 7 SR k := refineSegmentation( I, ck) ; // see Section 6.3 8 end 9 S := SC ∪ SR; 10 forall pD I,j ∈ PD I do 11 RD j := getRegionAdjacencyProfile(pD I,j, S) ; // see Section 6.4 12 end 13 P I I , R I := identifyCrosspoints(PD I ,RD) ; // see Section 6.4 14 pI I,j0 , O := getOriginAndOrientation( I, S, RI) ; // see Section 6.5 15 correctOrientation(RI , O) ; // see Section 6.6 16 expansionDirections := {up, down, right, left}; 17 Initialize P 0 T ; 18 Add P I I,j0 to P 0 T ; 19 forall e ∈ expansionDirections do 20 P e T := expandGraph(pI I,j0 , RI , P e−1 T ) ; // see Section 6.7 21 end 22 return PI , PT ; 6.1 Crosspoint Detection detectCrosspoints A core part of our indexing algorithm is the detection of the checkerboard crosspoints (see Figure 6.2). After discussing possible approaches in Chapter 4, we choose the localization approach proposed by Geiger et al. [GMCS12] described in Section 4.3. It robustly provides the locations PD I of the individual crosspoints with subpixel precision. We omit the recovery step of the original method which includes a local linearity condition and is only able to detect full rectangular patterns (i.e. with a constant number of grid points per row and column). This way, also partial views of checkerboards and more general distortions are allowed by our algorithm. By detecting the crosspoint locations at the beginning of the algorithm, we gain infor- mation about the pattern location within the image which is used by subsequent methods. 29 6. Graph-Based Checkerboard Indexing Algorithm Figure 6.2: Example crosspoint detection. The complexity of the crosspoint detection is O(N), where N is the number of pix- els in the input image, as the corner likelihood and gradient statistics are calculated for each pixel. 6.2 Pattern Detection getROI At this point, we need to determine a mask M for a region of interest (ROI) in which only the pattern is visible. This step is necessary for the subsequent segmentation. We can make use of the previously detected crosspoint locations PD I : M is calculated by shrinking the convex hull of the localized crosspoints. This way, the ROI only contains the interior part of the pattern. Parts of patches close to the pattern boundary may get eliminated without affecting the functionality of our algorithm. For further processing, a cropped image is used together with M : By eliminating the image parts which are not relevant to the task, computational cost can be reduced. 6.3 Distorted Checkerboard Segmentation Image segmentation is used to identify the individual patches in the distorted pattern as described in Chapter 5 (see Figure 6.3). In this part of the algorithm, each pixel in the input image is assigned a region label where each region corresponds to a patch in the pattern. At this point, an approximate segmentation is sufficient and remaining pixels are assigned to the background. Additionally, center points are calculated for each segmented region. coarseSegmentation Our aim is to separate the individual patches into individual regions. For this purpose, 30 6.3. Distorted Checkerboard Segmentation Figure 6.3: Example segmentation: Each dot represents a segmented region. a coarse segmentation is sufficient. We can use the gradient magnitude of the input image to expose the pattern structure as described in Section 5.2. The edge structure can be exposed by thresholding. After binarization, pixels outside the ROI are set to 1 to limit the segmentation to the pattern. The current binary image does not show a smooth structure yet: Holes appear where the gradient magnitude is low at the center of a blurred crosspoint. Also, thresholding can cause lines to be disconnected. Such problems can be solved by applying morphological operations. We use bridging and closing to guarantee that gaps between curve elements are closed, in order to prevent the merging of neighboring regions. In the current binary image, the foreground corresponds to the edges. In order to mark the patches as foreground instead, we use the complement of the image, i.e. pixel values of 1 and 0 are switched. This way, we created a binary image with separated objects for each patch. Erosion is performed to reduce fine detail at the borders of the components. This way, the overall rough structure of the pattern is captured, neglecting details close to the border between patches which can be prone to noise. Now, we can identify the corresponding pixels by finding connected components as described in 5.3, providing a coarse segmentation SC . refineSegmentation Now that we have separated the individual patches in SC , these regions are refined. Our goal is to assign the pixels close to the checkerboard edges that have been neglected in the previous step to the corresponding patches. We choose to perform region growing is performed to subsequently approximate each patch more accurately as described in Sec- tion 5.4. This technique requires a seed point for each region. With the coarse segments SC already provided, we can make use of the included pixel information. Among all the pixels belonging to a coarse segment, we assume that the center pixel is best suited as a seed point for subsequent region growing: It is located in the homogeneous part of the patch, with the largest distance to all four edges. This is a favourable property, because this way, the region can grow within the homogeneous part in all directions. Thus, the 31 6. Graph-Based Checkerboard Indexing Algorithm final region will also be placed centrally within the edge. With a seed point closer to a patch edge, pixels within the transition part near the edge may be added at an earlier stage, with a larger impact on the mean value. This way, other pixels from the transition area or other patches could be added falsely. Alternatively, the growing loop could be terminated at an early stage due to a large intensity distance. This way, the final grown region would not be equally close to its neighboring regions. To prevent a region from growing outside the ROI, adjust the original algorithm1 to use the mask M during the expansion: Neighbors outside the ROI will not be considered. We perform region growing for each coarse segment subsequently. This way, a connected grown region SR k can become disconnected if some pixels p ∈ SR k will get assigned to a different region SR k′ at a later iteration, k′ > k. To guarantee that final regions remain connected, we iterate through all segments provided by the growing procedure and examine their connectivity. For disconnected components, only the component with the largest area will be considered as the final grown region. The final labelling l of each pixel p in the input image I is constituted by the union of the labellings from the coarse and the refined segmentation SC , SR, respectively: l(p) = { lk ∃k ∈ {1...N}, p ∈ SC k ∪ SR k 0 otherwise, (6.1) where N is the number of segmented regions and lk > 0 the label of the kth region. The complexity of the coarse segmentation is O(N), N denoting the number of pixels in the cropped image, due to the per-pixel preprocessing and the segmentation where each pixel is visited at most once. The complexity of the refinement part depends on the image content: In a worst case scenario, each region grows to cover the entire image, resulting in a complexity of O(|S| ·N) and |S| ≤ N . In a good scenario, however, |S| << N and each region only visits the pixels belonging to its patch. As a result, each pixel is visited once by a single growing iteration, yielding a complexity of O(N). Compared to the SCIS algorithm [Cer15] which creates a pyramid containing all the segmentation stages, our algorithm only contains two levels: pixel and region level. For the checkerboard indexing task, this is sufficient, because when segmenting a homogeneous patch, the intermediate stages do not contain important additional information. While for textured images, different amounts of detail in different segmentation stages may be valueable, details in a homogeneous patch are often noise-related and the order in which pixels are merged is less meaningful. 1Dirk-Jan Kroon. MATLAB code published online: https://de.mathworks.com/matlabcentral/fileexchange/19084- region-growing [accessed January 2018] 32 6.4. Crosspoint Identification 6.4 Crosspoint Identification getRegionAdjacencyProfile In this part of the algorithm, the region labels and center points from the previous part are used to identify individual crosspoints, assigning them the labels of the four adjacent regions. In order to find the adjacent regions of each crosspoint, a distance transform is performed to obtain the four regions which are closest to it. The distance dRAP of a crosspoint p ∈ PD I to a segmented region S is then calculated as: dRAP = min pk∈Sk d2(p, pk), (6.2) where d2 is the Euclidean distance. This is motivated by the fact that these four regions form a unique region profile for each crosspoint which can be used for identification: Considering the graph-based approach described in Section 3.2, each crosspoint represents a face in the RAG which is formed by the nodes corresponding to the four adjacent patches at this crosspoint (see Figure 6.4). There is no other face that is constituted by the same nodes as there is only one crosspoint in a checkerboard with the same four adjacent patches. Thus, a unique region adjacancy profile (RAP) can be defined for each crosspoint. Let G and G′ be the RAG and its dual graph with nodes N and N ′, respectively. Then the RAP R is defined as: R(n′) = (Nk, α) (6.3) Nk = {n1, n2, n3, n4} ⊆ N (6.4) n′ ∈ N ′ In R, the elements of the subset Nk are sorted thanks to φ : Nk → Nk which assigns each node n its neighbor within the face n′ in clockwise order: φ(n1) = n2 (6.5) φ(n2) = n3 (6.6) φ(n3) = n4 (6.7) φ(n4) = n1 (6.8) At this point, we adapt the orbit concept and notation used by Cerman [Cer15] for his graph-based image representation. A unique RAP can be defined using multiple definitions of φ. Cerman suggests clockwise and counter-clockwise sorting. However, other permutations on the set {1...4} can be used as well. Clockwise and counter-clockwise sorting are well-suited for graph-based representations. We choose the former, because it’s simple to calculate for the adjacent checkerboard regions: We can sort them by sorting their region center points relative to the crosspoint. For this purpose, we consider the polar coordinates (θ, r) of the relative positions in image space and sort them based on their θ values: Let ci = (ci,x, ci,x) be the pixel coordinates of the centroid of the region represented by the node ni and p = (px, py) 33 6. Graph-Based Checkerboard Indexing Algorithm Figure 6.4: Example RAP: Labelled nodes and edges of the RAG (red dots and dotted lines) around a crosspoint (green cross). The order of the labels is defined by φ (blue dashed arrow). be the pixel coordinates of the crosspoint node n′. Then the sorting of the nodes ni within the RAP of n′ are determined by: θ1 < θ2 < θ3 < θ4 θi = arctan ( ci,y − py ci,x − px ) (6.9) This way, the regions are sorted clockwise starting with the lower right region relative to the crosspoint. However, any other position could be used as a starting point as well. Equivalently to Equation 6.3, we can write R in terms of the labels lk of each region node n. R(n′) = (Lk, φ) (6.10) Lk = {lk1 , lk2 , lk3 , lk4} (6.11) ki ∈ {1...N}, i = {1...4}, where lki denotes the label of the node ni, i = {1...4}. identifyCrosspoints We add an optional procedure which ensures that for each RAP, a single crosspoint is found: Depending on the crosspoint detection method, it is possible that multiple cross- points or corners are found which correspond to the same true checkerboard crosspoint. Consequently, they all share the same RAP which can be used to identify them. Now, there are multiple locations for the same crosspoint available. A basic way of determining the location of the final crosspoint is to average the individual locations. We choose this approach for our algorithm, because we assume that our crosspoint detection is robust enough and a simple method is sufficient. A possible more advanced approach would be to use a weighted average, for instance. 34 6.5. Origin and Orientation Detection Origin from highest Saturation Figure 6.5: Example origin detection: Detect origin crosspoint between two regions with highest mean saturation. The complexity of calculating the RAPs is governed by determining the four closest regions for each crosspoint. It requires the distance calculation to each region for each crosspoint and a subsequent sorting. This yields a complexity of O(|P | · (|S|+ log |P |)). 6.5 Origin and Orientation Detection Another core part of the indexing algorithm is the detection of the origin of the target space. It is needed to assign each crosspoints its target coordinate relative to this origin. getOriginAndOrientation At first, the origin needs to be detected (see Figure 6.5). For this purpose, we want to identify the colored patches in the pattern image (see Section 3.3). As they have higher saturation values compared to black and white patches, the RGB image is converted into the HSV color space. The two color-coded patches are identified by calculating the average saturation for each segmented region and selecting the regions corresponding to the two largest values. This way, the origin is detected as the crosspoint between these two regions. It can be found by considering the crosspoints RAPs. However, depending on camera and illumination properties, the mean saturation of the green patch is not always sufficiently high. Consequently, the two detected regions are not always adjacent and thus, do not always enclose a unique crosspoint. In this case, the best region candidate among the diagonal neighbors of the red patch, which we assume to be identified robustly, is selected. Now, the origin crosspoint is provided, but the orientation of the pattern in the image still needs to be detected (see Figure 6.6). It is encoded in the pattern by defining the red patch as the upper left region within the RAP of the origin crosspoint. Thus, the orientation can be detected by identifying the red patch among the two origin regions. The colors red and green can be distinguished by their hue values. Thus, the two origin 35 6. Graph-Based Checkerboard Indexing Algorithm Figure 6.6: Example orientation detection: Detect pattern orientation from RAP position of region with lower hue value. regions are sorted according to their average hue values, in order to identify the upper left and lower right patch: For a red-green-coded pattern as in Figure 3.5, the upper left patch is defined as the region with a lower average hue value. The complexity of the color space conversion is O(N). For the calculation of the mean saturation and hue values per region, we consider that each pixel is assigned to one region at most. Thus, this part can be bound by O(N) as well. Sorting the saturation values yields a complexity of O(|S| · log |S|). Analyzing the origin crosspoint calculation, we consider two different scenarios: If a crosspoint can be found between the two most saturated regions, the complexity of finding the crosspoint with the right RAP is O(|P |). However, if no such crosspoint is found, all the adjacent crosspoints of the most saturated region have to be considered. In a worst case scenario, it is merged with all diagonally adjacent (black) patches, while all white patches are separate regions. This way, all crosspoints would be adjacent to this large region. As there wouldn’t be four adjacent regions around the crosspoint, incorrect RAPs would contain a fourth region adjacent to another crosspoint. Depending on the geometry of the pattern in the image, this not-adjacent or one of the two not-diagonally adjacent regions could falsely be selected as possible origin-enclosing candiate. Considering incorrect RAPs and false origin candiates for all crosspoints, each of them could provide a possible origin-enclosing region. Sorting them to determine the highest saturation would yield a complexity of O(|P | · log |P |). 6.6 Orientation Correction So far, we have considered each crosspoint and its adjacent region independently. In the next section, we want to determine their topology within the entire dual graph of the RAG. To get information about the connectivity of the individual crosspoints, their RAPs can be used. At this point, however, they are sorted according to the region nodes’ positions relative to the respective crosspoint position - within the image. In order to 36 6.7. Target Coordinates Assignment Figure 6.7: Orientation correction of a RAP: Originally, the first node (blue) in the RAP corresponds to the lower right region relative to the crosspoint. After correction, the first node (blue) corresponds to the patch which is the upper left patch in the target pattern. Around the origin crosspoint, this patch is marked red. determine the points’ connectivity within the graph, we want to adjust their RAPs so that their sorting reflects the nodes’ positions within the graph. correctOrientation This adjustment can be realized by incorporating the orientation information gained in the previous section. At this point, we consider that the RAPs are provided in clockwise order, starting with the lower right adjacent region based on image coordinates. Now, we can use the information about the upper left and lower right regions around the origin. We define the upper left as the first region in the RAP within the target CS. Consequently, the degree of rotation of the pattern in the image can be measured by the shift of the upper left region’s position within the RAP. This shift can be calculated for the origin crosspoint where the information about the upper left and lower right regions is available. Then, orientation-correction can be performed for the other crosspoints by shifting their region profiles accordingly. Thus, the profiles are sorted according to the pattern orientation, starting with the upper left region in the target orientation. Figure 6.7 illustrates the node shift within the RAP during orientation correction. 6.7 Target Coordinates Assignment In this part of the algorithm, target coordinates are assigned to the crosspoints identified in the previous step. expandGraph In a final step, target coordinates are iteratively assigned to the crosspoints (see Fig- ure 6.8). Starting from the origin with coordinates (0,0), we follow the crosspoints along the coordinate axes and increment or decrement the coordinates accordingly. With 37 6. Graph-Based Checkerboard Indexing Algorithm Figure 6.8: Example graph expansion: Expand graph from detected origin by comparing RAPs and assign target coordinates. positive and negative directions for each axis, we leave each crosspoint with assigned target coordinates in four directions. For each direction, we can find the next crosspoint in this direction by finding the corresponding region at the corresponding position in its RAP. For instance, when moving up (i.e. in positive y direction), the current upper left region (i.e. at position 1 in the profile) has to be the lower left region (i.e. at position 4) for the next crosspoint. If we find a crosspoint with a region profile that satisfies this condition, we increment or decrement the target coordinate according to the direction, assign it to this point and continue with the expansion in all four directions. If we do not find a suitable crosspoint or if this crosspoint has already been assigned a target coordinate (coming from a different direction), we stop the expansion from the current crosspoint in the current direction. This way, the dual graph of the RAG is created where every node corresponds to a detected crosspoint. Its label corresponds to its unique target coordinate within target space. For the complexity analysis of the graph expansion, we consider that for each crosspoint which is successfully added to the graph, starting with the origin, we expand the graph in four directions. While this approach may suggest an exponential recursion, we note that the expansion stops if no crosspoint was added (either because there is no match or because this point has already been added). Thus, the expansion will be executed at most |P | times which corresponds to the case that all crosspoints are added to the graph. In each expansion step, we iterate over all crosspoints to find the suitable next point in the current direction. Thus, the complexity of the graph expansion is O(|P |2). 38 CHAPTER 7 Applications of Graph-Based Checkerboard Indexing In this chapter, we discuss how our graph-based checkerboard indexing algorithm could be useful for various applications. In particular, we explain how it addresses open problems in the fields of camera calibration und multi-view processing. Our checkerboard identification algorithm provides a mapping between image space and the target space represented by the pattern. The following properties suggest it can serve as a basis for different image processing applications: i) For a given image of a checkerboard pattern, the inherent distortion is captured implicitly without assuming a globally defined explicit distortion model. This way, the algorithm is also able to handle deviations from ideal distortion models which occur under real-life conditions. In particular, it is not limitied to a specific kind of distortion (such as radial destortion). ii) Due to its unique structures, the target pattern can be used as a global reference system for multiple views. The respective images can be processed independently, allowing an efficient parallel workflow. iii) Our graph-based approach finds the next crosspoint for each direction and outgoing crosspoint individually. At this point, it does not make any assumptions about the number of crosspoints. It is not required that the target space has a constant number of grid points along each isoline. Consequently, our algorithm allows checkerboards which are only partially visible in the image without selecting a rectangular subset of grid points. This property can also be useful in a multi-camera setup where a global pattern is used. In this case, the FOV of a single camera may contain only parts of the pattern. 39 7. Applications of Graph-Based Checkerboard Indexing In the following, we explain how specific applications can be described with the formal problem statement given in Equation 3.2. In particular, we propose how the respective target spaces can be defined. Above that, we discuss how these applications can profit from the previously mentioned properties. Common to all proposed applications, the goal is to learn the geometric distortions produced by the camera(s) and apply this information to new images. 7.1 Camera Calibration During common camera calibration, the projection from points in world to image space by the camera is estimated [Sze10]. This transformation is equal to the mapping fP defined in Equation 3.7. It includes both the transformation from world to image space as well as the distortion applied by the lens. Depending on further applications, only the distortion information may be relevant for further processing in image space. The inverse distortion information is implicitly contained in f : It is defined pointwise at each crosspoint (see Equation 3.2) after it has been reconstructed from our pointwise indexing mapping fT and the checkerboard function fI using Equation 3.12. Thus, a distortion function fD : PIT → PI can be derived by fD = f−1 (7.1) In order to reconstruct the entire projection mapping fP including the transformation from world to image space, the embedding fW of the checkerboard into world space has to be known. The respective points can be measured in a calibration procedure. With this information available, fP can be reconstructed as fP = (fW ◦ f)−1 (7.2) In both cases, the target space is best defined as regular grid, in order to show only the distortions caused by the camera in the image. It can be represented by a planar rectangular checkerboard. Due to property i), these steps can be performed for various lens types. In partic- ular, the algorithm can be applied to fisheye lenses where the projection cannot be modelled exactly but only approximated due to their large FOV. 7.2 Distortion and Undistortion More generally, the algorithm can be used to undistort or distort images for different purposes including artistic effects. The problem of undistorting images is closely related to estimating its distortion during camera calibration which is described in the previous section. Undistortion can be 40 7.3. Multi-View Processing achieved by applying the inverse of the distortion mapping fD defined in Equation 7.1. Hence, f can be used directly. As for the target space can be defined in the same way as for camera calibration. Conversely, for some applications, the goal may be to distort images in a specified way. At this point, specific distortions can be created by chaining together individual distortion mappings. This is possible, because the individual mappings between the different spaces can be decoupled and composed accordingly. A possible example of a chained distortion is to use a regular checkerboard as tar- get space and apply a deformation fDW to it in world space: A deformed checkerboard will be projected to image space. Then, fT will account for both fDW and the distortion caused by fP . The final mapping f will be the chained distortion mapping f = (fP ◦ fDW ◦ ιW )−1 (7.3) Decoupling can be introduced by using different mappings fI , f ′I for the initial checker- board creation and for composing f , respectively: In the previous example, where fI creates a regular checkerboard, an arbitrary mapping can be used for f ′I . This way, the final decoupled mapping f is constructed as f = f ′I ◦ (fP ◦ fDW ◦ ιW ◦ fI)−1 (7.4) Composed and decoupled distortion mappings can be used for various applications including artistic effects. Various lens distortions fP and deformations fDW are possible due to property i). 7.3 Multi-View Processing In multi-view applications, additional problems arise apart from distortions. A core issue is to find correspondances in images from different views. For applications where the individual images are combined to form an image mosaic, these correspondences are further used to align the individual images. The aim of the alignment procedure is to map each input image and to its corresponding subimage within the image mosaic. Common image alignment methods such as Brown and Lowe’s automatic technique using SIFT features [BL07] align images centrally by taking several overlapping images into account. The camera and video content manufacturer Indiecam, however, aims at creating panoramic images more efficiently by parallelizing image alignment [BHKB16]. They want to realize this for a multi-camera setup with fixed positions between individual cameras. For such an application, our indexing method can be used to solve the correspon- dance problem and align the individual images at the same time. As the relative position 41 7. Applications of Graph-Based Checkerboard Indexing and orientation between the individual cameras does not change, the alignment mapping can be found during a calibration procedure. In this case, the target space is defined by the image mosaic. In a 360 degree setup as used by Indiecam, a cylindrical checkerboard can be used to cover the combined FOV of all cameras. It can be created by applying a suitable cylindrical deformation fDW to a regular checkerboard. Similarly, a spherical checkerboard can be constructed. By placing the multi-camera setup inside the cylinder, each camera will see a different part of the checkerboard with overlapping FOVs between neighboring cameras. As each camera takes an image of the pattern, each will provide an image of a distorted checkerboard. Correspondances between these and the common target checkerboard can be determined independently with our indexing method. Thus, alignment can be performed in par- allel instead of pairwise using the mapping f 3.12 with a regular checkerboard mapping fI . At this point, we benefit from property ii): The checkerboard can only be used as a global reference system because it contains a unique origin which can be identified by multiple cameras independently. In a 360-degree setup, however, the origin will not be visible in all views. Instead, different sectors of the checkerboard will be visible for different cameras. While the current calibration patter only contains a single unique origin, it is possible to extend this design for 360 degree patterns. As we already use color-coding to mark two distinct patches, more colors can be introduced to mark different sectors as proposed by Mucciolo [MKS+17]. The number of sectors depends on the FOV of the individual cameras. At least one unique patch pair should be visible for each camera. Among those pairs, one ist defined as the origin of the CS, while the others are assigned respective coordinates. For a multi-view application, we also benefit from property iii), as each FOV will only contains parts of the entire checkerboard. 42 CHAPTER 8 Evaluation The aim of this chapter is to evaluate our indexing method. This includes the definition of suitable metrics to quantify the results of the different parts of the algorithm, the creation of a representative data set as well as the analysis of the results. Above that, general properties of the method which are independent from the considered data set are Figure 8.1: Different distortions of checkerboard patterns: fisheye lens with (a) planar and (b) cylindrical pattern and regular lens with (c) planar (d) arbitrarily deformed pattern 43 8. Evaluation discussed. In this discussion, possible improvements of the individual steps and future work regarding possible applications are considered. 8.1 Metrics and Ground Truth In this section, the metrics used for the quantitative evaluation of our indexing method are described. During this evaluation, the results of the individual steps of the algorithm as well as the final indexing are examined. Thus, different metrics are used to measure the quality of crosspoint detection, checkerboard segmentation and target coordinate assignment. In each case, the metric relies on a suitable ground truth definition. Sokolova et al. [SL09] present different metrics for classification tasks. In a binary classification problem, the data can be divided into positive and negative samples. The ground truth yields the true class label for each data point. Let Gp and Gn be the sets of positive and negative samples, respectively, according to the ground truth. Similarly, let Cp and Cn be the sets of points which are classified as positive and negative, respectively. With |.| denoting the cardinality of a set, the numbers of true positives (tp), true negatives (tn), false positives (fp) and false negatives (fn) are defined as follows: tp = |Gp ∩ Cp| (8.1) tn = |Gn ∩ Cn| (8.2) fp = |Gn ∩ Cp| (8.3) fn = |Gp ∩ Cn| (8.4) Based on these numbers, the metrics precision and recall can be calculated to evaluate a classifier: precision = tp tp+ fp (8.5) recall = tp tp+ fn (8.6) 8.1.1 Metrics for Crosspoint Detection These metrics are also useful when performing object detection in an image which can also be considered as a classification problem [Kle14]. In this context, precision determines the correctness of the detections, whereas recall evaluates the detector’s ability to detect actual objects. Note that neither of them considers the value of tn. It is less suited for such tasks, because it is difficult to determine the number of all possible non-objects. Consequently, only precision and recall are used to evaluate the crosspoint detection part of our algorithm. For clarity, we denote them precisioncp and recallcp. For each input image, ground truth information is provided as a mask image. At this point, we define that a crosspoint is required to have four crossing edges with a length of at least three pixels. This way, crosspoints between partially occluded or out-of-view patches at the border of the image are excluded. Also, the edges may be blurred and 44 8.1. Metrics and Ground Truth several pixels thick, but the gradient strength must be sufficient to identify the edge and the crosspoint location within a local neighborhood with a radius of four pixels. At this point, sufficiency is defined based on a subjective estimate during manual annotation. Due to the gradient requirement, crosspoints in shadowed areas such as the border of the FOV in fisheye images may be excluded. In the mask image, each crosspoint location is represented by a local neighborhood with a radius of four pixels. A detected crosspoint location is considered correct, if it lies within such a neighborhood. At this point, it is assumed that the localization is sufficiently accurate if a crosspoint is detected correctly and at most one crosspoint is found within each neighborhood. Therefore, binary classification is sufficient at this point. 8.1.2 Metrics for Checkerboard Segmentation On the contrary, multiple labels have to be considered for the checkerboard segmentation part. Sokolova et al. adjust precision and recall for general multi-class classification tasks. Alternatively, Unnikrishnan et al. [UPH07] propose similarity measures to evaluate image segmentation algorithms. They account for the fact that manual ground truth segmentations are often subjective and do not provide a unique solution. However, the segmentation part of our algorithm differs from traditional segmentation tasks. Its goal is to determine the four regions surrounding a crosspoint. For this purpose, it is sufficient for each segmented region to approximate the corresponding checkerboard patch. Information about the labels is needed sufficiently close to, but not necessarily along the borders, which are not distinct in real images. Those pixels remain unclassified. Thus, the ground truth labelling is not only not unique, but can also be undefined for some pixels. Also, the exact size and shape of a segmented region is not important. If two segmentations of the same checkerboard patch exist, their difference is not relevant as long as they do not contain parts of other patches. These properties lead to a different evaluation objective compared to traditional segmentation tasks. Consequently, new metrics are derived to measure the results of the two segmentation steps. Their goal is to capture • the ability to detect a checkerboard patch • if several patches are merged within the same segmented region • if several segmented regions cover the same patch For this purpose, a suitable definition of the ground truth segmentation has to be es- tablished: Each patch is represented by a rough region containing only interior pixels. Pixels for which ambiguous or missing labels are allowed are omitted. At this point, only patches surrounding a ground truth crosspoint are considered. Based on these criteria, we adapt the definitions for tp, fp, and fn given in Equa- tions 8.1, 8.3 and 8.4 as follows. Let Gi, i ∈ I, I = {0...M}, be the segmented regions 45 8. Evaluation Figure 8.2: Measured subsets for segmentation evaluation: Ground truth regions including incorrectly segmented regions with indices in Imissed (circle), Isplit (triangle) and Imerged (square) and segmented regions including incorrect segments with indices in Joutlier (circle), Jsplitting (triangle) and Jmerging (square). according to the ground truth and Sj , j ∈ J , J = {0...N}, the segmented regions of the examined segmentation S. Further, let \ denote the set difference. Then, we define tpseg := |I\ (Imissed ∪ Imerged ∪ Isplit)| (8.7) fpseg := |Joutlier ∪ Jmerging ∪ Jsplitting| (8.8) fnseg := |Imissed ∪ Imerged ∪ Isplit| , (8.9) where Imissed := {i ∈ I : @j ∈ J : Gi ∩ Sj 6= ∅} (8.10) Imerged := { i ∈ I : ∃i′ ∈ I, i 6= i′, j ∈ J : Gi ∩ Sj 6= ∅ ∧Gi′ ∩ Sj 6= ∅ } (8.11) Isplit := { i ∈ I : ∃j, j′ ∈ J , j 6= j′ : Gi ∩ Sj 6= ∅ ∧Gi ∩ Sj′ 6= ∅ } (8.12) Joutlier := {j ∈ J : @i ∈ I : Gi ∩ Sj 6= ∅} (8.13) Jmerging := { j ∈ J : ∃i, i′ ∈ I, i 6= i′ : Gi ∩ Sj 6= ∅ ∧Gi′ ∩ Sj 6= ∅ } (8.14) Jsplitting := { j ∈ J : ∃j′ ∈ J , j 6= j′, i ∈ I : Gi ∩ Sj 6= ∅ ∧Gi ∩ Sj′ 6= ∅ } (8.15) This way, fnseg denotes the number of ground truth regions which do not have an exact match in the segmentation S, including regions which are not detected at all (with indices in Imissed), those which are split into more than one segmented region (with indices in Isplit) and those which are merged together by the same segmented region with at least one other ground truth region (with indices in Imerged). In the same manner, fpseg counts the regions in S which do not correctly segment a corresponding ground truth region. Analoguous to Imerged and Isplit, they include segmented regions covering more than one (Jmerging) or splitting the same (Jsplitting) ground truth region, but also outlier regions which do not correspond to any checkerboard patch (with indices in Joutlier). Finally, tpseg denotes all correctly segmented regions with a one-to-one mapping to the 46 8.1. Metrics and Ground Truth ground truth. Using these numbers, the following metrics for the evaluation of the segmentation S can be derived: precisionseg = tpseg tpseg + fpseg (8.16) recallseg = tpseg tpseg + fnseg (8.17) The different subsets are illustrated in Figure 8.2. 8.1.3 Metrics For Indexing with Target Coordinates For the evaluation of the checkerboard indexing, we manually assign the ground truth target coordinates to each crosspoint defined in the ground truth mask image used in Section 8.1.1. We consider the following sets: • G - the set of ground truth crosspoints defined by the mask image (see Section 8.1.1) • D - the set of detected crosspoints calculated in Section 6.4 • DA ⊆ D - the set of detected crosspoint with assigned target coordinates • DC ⊆ DA ∩G - the set of crosspoints with correctly assigned target coordinates according to the ground truth This way, we distinguish between the indexing algorithm’s ability to detect a crosspoint (using DA) and to assign the correct target coordinate (using DC). Consequently, we adjust the definitions for tp, fp and fn accordingly: tpass := |G ∩DA| (8.18) fpass := |DA\G| (8.19) fnass := |G\DA| (8.20) Addtionally, we define the number of correct positives (cpass) as cpass := |DC | ≤ tpass (8.21) Based on these numbers, we define the metrics for the assignment of target coordinates as precisionass = tpass tpass + fpass (8.22) recallass = tpass tpass + fnass (8.23) precisionc,ass = cpass tpass + fpass (8.24) recallc,ass = cpass tpass + fnass (8.25) 47 8. Evaluation While the first two can be used to evaluates the existance of a node in the graph, the last two are more strict and evaluate their labels (i.e, the assigned coordinates). They can be adjusted to focus on the expansion of the graph and the assignment of coordinates without considering the orientation of the pattern. This way, different aspects of the indexing can be evaluated. Thus, we further define precisionc̃,ass = { precisionass topology and origin correct for DA precisionc,ass otherwise (8.26) recallc̃,ass = { recallass topology and origin correct for DA recallc,ass otherwise (8.27) Finally, the following two metrics are introduced to allow the comparison with common checkerboard indexing methods for traditonal checkerboards that detect neither orienta- tion nor origin. For this purpose, we can adjust 8.26 and 8.27 further to only measure the method’s ability to capture the checkerbooard topology, while neglecting both orientation as well as origin detection. precisionĉ,ass = { precisionass topology correct for DA precisionc,ass otherwise (8.28) recallĉ,ass = { recallass topology correct for DA recallc,ass otherwise (8.29) 8.2 Discussion In this section, we use the previously defined metrics to evaluate our indexing method. For our experiments, we created a data set of 49 images taken with two different cameras (an indiePOV HDSDI camera with a fisheye lens with a 185◦ FOV and a Zeiss optics camera). Apart from the distortions caused by the camera lenses, the images show different kinds of distortion depending on the deformation of the pattern (planar, cylindrical and arbitrary, see Figure 8.1). We implemented our indexing algorithm in MATLAB, using a 64-bit Windows machine. The functionality of our algorithm is controlled by different parameters. To exam- ine our method’s sensitivity to their values and to find possible dependencies between them, we perform a parameter analysis. We consider the most important parameters listed in Table 8.1. For comparison, we also add the option to perform only a coarse seg- mentation without refinement (indicated by a τ value of 0). These are the most relevant parameters as they control how the pattern structure is exposed and the degree of patch approximation. Thus, they have the highest impact on the resulting segmentation and, consequently, the RAPs for crosspoint identification. 48 8.2. Discussion Table 8.1: Parameter Analysis Parameter Description Values sclose the size of the structural element which is used to close 6, 7, 8 the bridged binary edge image during the coarse segmentation serode the size of the structural element which is used to erode 2, 3, 4 the closed binary edge image during the coarse segmentation τ the threshold distance between a new pixel’s intensity 0, 0.07, and the region mean during refined segmentation 0.1, 0.13 We calculate our metrics for all parameter combinations. The individual metric re- sults are discussed in detail in the subsequent subsections. Their mean values over all images are illustrated in Figures 8.3-8.8. The parameter analysis can also be used to optimize the individual values and to find a suitable parameter conbination which can be set as default configuration. Yet, different combinations may be optimal for different metrics. Some parameter values are good choices for all metrics, whereas in other cases, different values are preferable according to different metrics. Table 8.2 shows the parameter combinations corresponding to the best results for each metric. The metrics precision and recall which we use for the evaluation of the crosspoint detection are not listed, as they are not influenced by the chosen parameters. We can see that for all metrics, the best results are achieved without refined segmentation. Naturally, the region growing metrics are an exception, because a value τ > 0 is needed to evaluate the refinement step itself. For the morphological parameters, the results are less distinct. They vary between different steps of the algorithm, but also between precision and recall. Thus, it is more difficult to select "best" values which are suitable as default configuration. However, we decide that the results of the metrics measuring the effectiveness of the entire algorithm should be emphasized compared to metrics measuring intermediate results. Therefore, we focus on the six assignment metrics. Among these, we consider that the c, ass and c̃, ass metrics are the most important as they consider the correctness of the assigned target coordinates, at least apart from orientation errors. We can see that they share the same values configuration for serode and τ . As the c, ass metrics are more important, we choose to select the closing value corresponding to their best results. Eventually, we use sclose = 7, serode = 3, τ = 0 for our subsequent evaluation. The corresponding results are displayed in Figure 8.9. The following subsections discuss the results of the final configuration as well as the parameter analysis. 8.2.1 Crosspoint Detection The adapted crosspoint detection approach works well on the dataset with a mean precision and recall of 0.9907 and 0.9903 and respective median values of 1 and 1, with 49 8. Evaluation Figure 8.3: Parameter analysis for coarse segmentation Figure 8.4: Parameter analysis for region growing 50 8.2. Discussion Figure 8.5: Parameter analysis for final segmentation Figure 8.6: Parameter analysis for target coordinate assignment 51 8. Evaluation Figure 8.7: Parameter analysis for correct target coordinate assignment Figure 8.8: Parameter analysis for correct target coordinate assignment without orienta- tion 52 8.2. Discussion Table 8.2: Best configuration for each metric Metric sclose serode τ precisionC seg 8 3 - recallCseg 8 4 - precisionR seg 7 3 0.007 recallRseg 6 4 0.007 precisionseg 8 3 0 recallseg 8 4 0 precisionass 7 2 0 recallass 6 4 0 precisionc,ass 7 3 0 recallc,ass 7 3 0 precisionc̃,ass 6 3 0 recallc̃,ass 6 3 0 Table 8.3: Mean and median metric values for sclose = 7, serode = 3 and τ = 0 Metric Mean Median precisioncp 0.9907 1 recallcp 0.9903 1 precisionseg 0.9845 0.9895 recallseg 0.9553 0.9727 precisionass 0.9959 1 recallass 0.9029 0.9546 precisionc,ass 0.4786 0.0128 recallc,ass 0.4109 0.0115 precisionc̃,ass 0.6198 0.9865 recallc̃,ass 0.5489 0.7342 individual values depicted in Figure 8.9. False negatives occur close to the border of the image or FOV (in case of a fisheye lens) where large parts of adjacent regions are occluded and edges vanish. As expected, it is more likely to allow outliers compared to the original method [GMCS12], with the benefit of allowing a greater variety of distortions and also incomplete checkerboards (see Figure 8.10): Geiger et al. achieve a mean precision and recall of 0.9968 and 0.8738 and median values of 1 and 0.9091. 8.2.2 Segmentation Both segmentation steps depend on a correct crosspoint localization, as outlier crosspoints lead to false positive regions outside the ROI and merged regions close to the border of 53 8. Evaluation Figure 8.9: Metric values for final parameter configuration the pattern. Apart from a correct pattern ROI, the coarse segmentation is also influenced by illumination. However, in a controlled calibration environment appropriate conditions can be provided. While region growing is sometimes able to remove false positives from the coarse segmen- tation, it also tends to merge correctly segmented regions. Figures 8.4 and 8.5 clearly show that precision and recall decrease with increasing τ . Altogether, the disadvantage is larger than the benefit and better results can be achieved without region growing. This shows that our indexing algorithm indeed only requires an approximate segmentation. Figures 8.3 and 8.5 illustrate that precision and recall can be increased by increas- ing the structural element for closing. This can be expected as false positives due to holes in the edge image are removed and gaps are closed which avoids merging with neighboring regions. Above that, it can be noticed how increased erosion yields larger precision values, while recall benefits from decreasing it. This tendency accounts for the fact that with stronger erosion, false positive but also small true positive regions are more likely to be removed. The mean and median precision and recall values for sclose = 7 and serode = 3 show that segmenting false positive regions is a smaller issue than missing or merging regions. Also, higher median values show that the mean values are strongly influenced by individual image results. This can be verified by regarding the individual metric values depicted in Figure 8.9. 54 8.2. Discussion Figure 8.10: Comparing our adapted crosspoint detection method (∗) to the original (◦): Improved detection for (a) incomplete checkerboards and (b) arbitrary distortions, at the expense of (d) outliers due to background features in the calibration environment. Both fail to detect (c) crosspoints close to borders or occluders. 8.2.3 Crosspoint Identification Generally, RAPs are well suited to describe a crosspoint. However, they rely strongly on a correct segmentation. We assumed that an approximate segmentation of a patch is sufficient. A RAP of a crosspoint p is formed by the four regions Sk, k = 1...4, with minimal values of dRAP . Thus, lowering these values by a more accurate approximation of the patch (i.e, a larger segmentation including more pixels belonging to the patch, with narrower gaps between the individual segments) does not change the RAP, unless it allows dRAP to fall below the value of at least one of the four closest regions for at least one region which was ranked fifth or higher. The maximum distance which still yields a correct RAP is determined by the geometry of the distorted pattern: It has to be less than the distance to further regions adjacent to neighboring crosspoints. If a coarse segmentation can guarantee this distance for each crosspoint, refinement is not only not needed for a correct crosspoint identification, but also does not provide any additional information for the RAP. On the contrary, incorrectly merged regions around 55 8. Evaluation (-5,-4) (-5,-3) (-4,-4) (-4,-3) (-5,-2) (-4,-2) (-5,-1) (-4,-1) (-3,-4) (-5,0) (-3,-3) (-4,0) (-3,-2) (-5,1) (-4,1) (-3,-1) (-5,2) (-2,-4) (-2,-3) (-4,2) (-3,0) (-2,-2) (-5,3) (-2,-1) (-4,3) (-3,1) (-5,4) (-1,-4) (-2,0) (-1,-3) (-3,2) (-4,4) (-1,-2) (-1,-1) (-2,1) (-3,3) (-1,0) (-2,2) (-3,4) (0,-4) (0,-3) (0,-2) (-1,1) (0,-1) (-2,3) (0,0) (-1,2) (-2,4) (0,1) (-1,3) (1,-3) (1,-2) (1,-4) (1,-1) (0,2) (1,0) (-1,4) (1,1) (0,3) (1,2) (0,4) (2,-2) (2,-1) (2,-3) (2,0) (1,3) (2,-4) (2,1) (2,2) (1,4) (2,3) (3,-1) (3,0) (3,-2) (3,1) (3,-3) (2,4) (3,2) (3,-4) (3,3) (3,4) (4,0) (4,1) (4,-1) (4,2) (4,-2) (4,3) (4,-3) (4,4) (4,-4) Figure 8.11: Correct origin detection and graph expansion, but failed orientation correc- tion due to high hue value of the red patch. a crosspoint will distort the RAP and lead to incorrect results for at least the crosspoint in question. In case multiple detected crosspoints have the same RAP, the location of the final crosspoint is currently obtained by averaging their pixel coordinates. This approach is useful for other crosspoint detectors which localize corners rather than crosspoints such as the Harris corner detector [HS88]. However, our detector based on Geiger et al. [GMCS12] already localizes each crosspoint with high accuracy. With this method, multiple crosspoints with the same RAP rather suggest an incorrect segmentation and/or a false positive crosspoint. Hence, their average location is not likely to correspond to a true crosspoint location. Under these circumstances, taking any of the multiple candidates will probably already improve the chances of correct indexing. 8.2.4 Color-Coded Origin and Orientation Detection The origin of the target coordinate system is determined by assuming that the two segmented regions with the highest mean saturation correspond to the colored patches. Our evaluation shows that this is only true for 51.02% of the images. For the rest, the mean saturation of the segmented green patch is not sufficiently high. In these cases, we assume that the red patch is found correctly and choose the best candidate for the green patch (i.e. region with the highest mean saturation) among its diagonal neighbor regions. This way, the origin detection is correct in 73.47% of the cases, but still fails if other diagonal neighbor regions have a higher mean saturation than the green patch (see Figure 8.13). To make detection more robust, the color of the second (currently green) patch should be adjusted accordingly. Also, color calibration can be used for the printing and imaging of the pattern by the camera. It has to take into account illumination as well. The orientation of the target coordinate system is determined by assuming that among 56 8.2. Discussion (5,0) (5,-1) (5,1) (5,-2) (5,2) (5,-3) (4,0) (4,-1) (4,1) (5,3) (4,-2) (4,2) (5,-4) (4,-3) (4,3) (5,4) (4,-4) (4,4) (3,0) (3,-1) (3,1) (3,-2) (3,2) (3,-3) (3,3) (3,-4) (3,4) (2,0) (2,-1) (2,1) (2,-2) (2,2) (2,-3) (2,3) (2,-4) (2,4) (1,0) (1,-1) (1,1) (1,2) (1,-2) (1,-3) (1,3) (1,4) (1,-4) (0,4) (0,0) (0,1) (0,-1) (0,2) (0,-2) (0,3) (0,-3) (0,-4) (-1,0) (-1,-1) (-1,1) (-1,2) (-1,-2) (-1,3) (-1,-3) (-1,4) (-1,-4) (-2,0) (-2,1) (-2,-1) (-2,2) (-2,3) (-2,-2) (-2,4) (-2,-3) (-2,-4) (-3,1) (-3,0) (-3,2) (-3,-1) (-3,3) (-3,-2) (-3,4) (-3,-3) (-3,-4) (-4,1) (-4,0) (-4,2) (-4,3) (-4,-1) (-4,-2) (-4,4) (-4,-3) (-4,-4) Figure 8.12: Correct target coordinate assignment with cylindrical deformation and fisheye distortion. Figure 8.13: Orientation is sometimes detected incorrectly at the origin due to a not sufficiently large mean saturation of the green patch and corrected incorrectly due to locally varying rotation angles (a) original-size rgb image, (b) incomplete assignment of target coordinates (NaN - short for Not a Number - indicates a missing target coordinate). the two regions marking the origin, the one corresponding to the red patch has the lower mean hue value. However, this assumption does not consider the cyclic hue spectrum. Depending on camera settings and illumination, the color of the red patch can vary between red and pink, corresponding to very low or very high hue values. Therefore, a more precise orientation detection can be performed by adding thresholds to the < comparison. An example of a failed orientation detection is shown in image 8.11. 57 8. Evaluation Figure 8.14: Results of different assignment metrics 8.2.5 Orientation Correction and Assignment of Target Coordinates The quality of the coordinate assignment strongly depends on the considered metric as illustrated in Figures 8.9 and 8.14. Similar to crosspoint detection and segmentation, precision is also higher than recall for the coordinate assignment. The assignment alone, i.e. the addition of a crosspoint to the graph, achieves very good results. However, compared to all other metrics, precision and recall for correct assignment achieve very low values. While for the remaining metrics, the median values are always higher, it is much lower than the mean value for precisionc,ass and recallc,ass. This is due to the fact that for many files, 0 or 1 crosspoint among all true positives are assigned correctly. This can be explained by the fact that in case the orientation detection fails, all crosspoints except for the origin will be assigned incorrect target coordinates. If the origin detection fails as well, no crosspoint will be labelled correctly. For these cases, the last two assignment metrics which do not consider the orientation are more expressive: After all, for the evaluation of our graph-based approach, it is important to distinguish cases where the indexing fails completely from those, where the orientation is incorrect, but the topology of the checkerboard is captured correctly (see Figure 8.11). Figure 8.14 shows that the latter case is true for some of the images with very low values of precisionc,ass and recallc,ass (about 14% of all images). The reasons for the failed orientation detection have been discussed in the previous subsection. Still, the values of precisionc̃,ass and recallc̃,ass are lower than the respective cross- point detection and segmentation metrics. On the one hand, errors are caused by false positive crosspoints and/or segmented regions which yield incorrect RAPs. They can be 58 8.3. Applications and Future Work reduced by stricter calibration conditions where no environment features are visible in the background. On the other hand, the current approach for the graph expansion is not suited for some distortions: The current orientation correction is performed globally. This scheme relies on a constant global rotation of the pattern. Consequently, our approach works well with images with similar angles of rotation in all parts of the pattern. Under this condition, different kinds of distortion and deformation can be handled (see Figure 8.12). In cases where the distortion leads to different angles in different parts of the image, the assignment of target coordinates fails, because region profiles are not matched correctly due to a different position of the upper left region within the profile vector (see Figure 8.13). This limitation can be overcome by only considering the order of the regions within the vector without assuming a fixed starting point. For comparison with the indexing provided by Geiger et al. [GMCS12], we consider precisionĉ,ass and recallĉ,ass as they do not detect the color-coded origin and orientation. As for the crosspoint detection, our method performs better with partial and arbitrarily deformed checkerboards, leading to higher metric scores (0.7216 vs 0.6528 for mean precisionĉ,ass and 0.6456 vs 0.5889 for mean recallĉ,ass). Yet, fisheye distortion is well- handled by Geiger et al. Their orientation detection also works robustly, even though it is limited to complete unoccluded rectangular patterns. 8.3 Applications and Future Work In this section, we discuss the suitability of our indexing algorithm for the applications suggested in Chapter 7. At this point, the evaluation results presented in the previous section need to be con- sidered. They reveal that the current indexing approach generally supports different kinds of distortions and deformations, so property i) presented in Chapter 7 is satisfied. For particular distortions such as strong fisheye distortion, however, support is currently limited due the constant orientation assumption. We also showed that the designed calibration pattern allows the automatic detection of origin and orientation as stated in property ii). At this point, detection only works under limited conditions and is not robust against certain camera and illumination variations. However, we aim at applying our method in a controlled environment of a calibration procedure where these restrictions can be considered. Regarding the current limitations of both properties i) and ii), the alterations suggested in the previous section can be implemented easily to improve the technique. As for property iii), partial checkerboards patterns are well-handled by our method. Assuming a correct graph-based indexing, the target coordinates are provided at the checkerboard crosspoints. For the proposed applications such as image alignment, however, the mapping needs to be defined for each pixel in the image. So far, it is only 59 8. Evaluation (-1,-1) (-1,0) (-1,1) (0,-1) (0,0) (0,1) (1,-1) (1,0) (1,1) Figure 8.15: Target coordinates provide mapping only at crosspoint resolution. provided at a low resolution defined by the pattern resolution (see Figure 8.15). A basic mapping for the remaining image points can be derived by interpolation. However, this approach is prone to errors if the pattern resolution is too coarse to capture the image content. Alternatively, as proposed by Banaeyan et al. [BHKB16], the accuracy can be increased by repeating the identification procedure for multiple views: As the camera is shifted relative to the pattern, the mapping to target space will be defined for different pixel positions in each iteration. Apart from the resolution of the mapping, future work also includes handling of depth and parallax. Our aim is to apply the geometric information gained from the indexing during the calibration procedure to new images. The objects captured in these images will possibly have a different position relative to the cameras than the pattern object used for calibration. This can cause these objects to be mapped incorrectly with our approach. Above that, the problem of parallax which occurs during image alignment has not been addressed yet. 60 CHAPTER 9 Conclusion In this thesis, a new technique for checkerboard indexing has been presented. It uses a novel model-free approach combining graph-based representations and basic connected component segmentation to identify checkerboard crosspoints using RAPs. These RAPs can be used to create a dual RAG where each crosspoint represents a node in the graph. This way, the structure of the pattern is captured. The algorihtm can be applied to images of a possibly distorted or occluded checkerboard pattern which have been taken during a calibration procedure. For this purpose a suitable pattern design has been developped. The proposed algorithm works well under suitable camera and illuminations condi- tions and allows different kinds of distortions. However, robustness has to be improved for origin and orientation detection as well as handling particular distortions with strongly varying rotation within the pattern such as strong fisheye distortion. Some of the existing problems can be solved by adjusting the calibration setup. For the others, possible solutions have been proposed. Compared to a state-of-the-art checkerboard detection and indexing method, our approach is less robust against outliers and fisheye distortion, but is better suited to handle partial or occluded patterns and limited arbitrary deformation. Above that, possible applications of the presented indexing algorithm have been discussed in this thesis. As a model-free method which is designed for the use in a multi-view setup, it is an interesting alternative to common approaches for various applications. At this point, a basic concept is provided. Due to the limitations mentioned above, the current problems of these application which motivate this approach have not yet been entirely solved. Apart from open problems of the stand-alone indexing, further research is necessary to make this technique applicable in the context of camera calibration and multi-view processing. For these applications in particular, open problems include: increasing the resolution of the calculated mapping to the image resolution and handling 61 9. Conclusion different depth values or parallax. 62 List of Figures 3.1 An example checkerboard pattern in target, image and world space: The red dot marks the origin of the pattern. The green solid arrows represent the coordinate axes in each space. The mappings between these spaces are depicted by dotted blue arrows. . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Graph-based representations of a checkerboard: Primal RAG (red dot nodes and dashed edges) and dual graph (green cross nodes and solid edges) . . 12 3.3 Examples of calibration patterns: (a) Distorted traditional checkerboard pattern. Image taken from Geiger et al. [GMCS12]. (b) Alternative color- coded calibration pattern. Image taken from Aleman at al. [FLDC14]. . . 13 3.4 Examples of geometrical reference structures captured by the RAG . . . . 13 3.5 Target CS represented by a checkerboard pattern with a color-coded origin between the red (upper left) and green (lower right) patch. . . . . . . . . 14 4.1 Saddle point and face in primal and dual graph. The arrows represent pixel relations and are color coded according to their sstrict values (black for 0, green for 1). Image taken from Cerman [Cer15]. . . . . . . . . . . . . . . 16 4.2 CB vector (outer green cirle) with characteristics at a crosspoint (red point): four sign-changing indices (points with red circles), differences between oppo- site sign-changing indices corresponding to vector half-length (black arrows) and similar sign-changing indices to smaller CB vector (blue inner circle). Image taken from Bok et al. [BHK16]. . . . . . . . . . . . . . . . . . . . . 18 4.3 Corner score based on (a) ideal and (b) rotated corner prototypes and (c) gradient statistics. Image taken from Geiger et al. [GMCS12]. . . . . . . . 19 5.1 Additional node inserted in the graph representation by the SCIS algorithm at a saddle face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.1 Illustration of our checkerboard identification algorithm using color-coded region labels: (a) original-size grayscale image with detected crosspoints and ROI, (b) coarse segmentation on cropped image, (c) refined segmentation (d) assigned target coordinates. . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.2 Example crosspoint detection. . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.3 Example segmentation: Each dot represents a segmented region. . . . . . . 31 63 6.4 Example RAP: Labelled nodes and edges of the RAG (red dots and dotted lines) around a crosspoint (green cross). The order of the labels is defined by φ (blue dashed arrow). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.5 Example origin detection: Detect origin crosspoint between two regions with highest mean saturation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.6 Example orientation detection: Detect pattern orientation from RAP position of region with lower hue value. . . . . . . . . . . . . . . . . . . . . . . . . 36 6.7 Orientation correction of a RAP: Originally, the first node (blue) in the RAP corresponds to the lower right region relative to the crosspoint. After correction, the first node (blue) corresponds to the patch which is the upper left patch in the target pattern. Around the origin crosspoint, this patch is marked red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.8 Example graph expansion: Expand graph from detected origin by comparing RAPs and assign target coordinates. . . . . . . . . . . . . . . . . . . . . . 38 8.1 Different distortions of checkerboard patterns: fisheye lens with (a) planar and (b) cylindrical pattern and regular lens with (c) planar (d) arbitrarily deformed pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 8.2 Measured subsets for segmentation evaluation: Ground truth regions including incorrectly segmented regions with indices in Imissed (circle), Isplit (triangle) and Imerged (square) and segmented regions including incorrect segments with indices in Joutlier (circle), Jsplitting (triangle) and Jmerging (square). . . . 46 8.3 Parameter analysis for coarse segmentation . . . . . . . . . . . . . . . . . 50 8.4 Parameter analysis for region growing . . . . . . . . . . . . . . . . . . . . 50 8.5 Parameter analysis for final segmentation . . . . . . . . . . . . . . . . . . . 51 8.6 Parameter analysis for target coordinate assignment . . . . . . . . . . . . . 51 8.7 Parameter analysis for correct target coordinate assignment . . . . . . . . 52 8.8 Parameter analysis for correct target coordinate assignment without orienta- tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 8.9 Metric values for final parameter configuration . . . . . . . . . . . . . . . 54 8.10 Comparing our adapted crosspoint detection method (∗) to the original (◦): Improved detection for (a) incomplete checkerboards and (b) arbitrary distortions, at the expense of (d) outliers due to background features in the calibration environment. Both fail to detect (c) crosspoints close to borders or occluders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 8.11 Correct origin detection and graph expansion, but failed orientation correction due to high hue value of the red patch. . . . . . . . . . . . . . . . . . . . . 56 8.12 Correct target coordinate assignment with cylindrical deformation and fisheye distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 64 8.13 Orientation is sometimes detected incorrectly at the origin due to a not sufficiently large mean saturation of the green patch and corrected incorrectly due to locally varying rotation angles (a) original-size rgb image, (b) incomplete assignment of target coordinates (NaN - short for Not a Number - indicates a missing target coordinate). . . . . . . . . . . . . . . . . . . . . . . . . . . 57 8.14 Results of different assignment metrics . . . . . . . . . . . . . . . . . . . . 58 8.15 Target coordinates provide mapping only at crosspoint resolution. . . . . 60 65 List of Tables 8.1 Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 8.2 Best configuration for each metric . . . . . . . . . . . . . . . . . . . . . . 53 8.3 Mean and median metric values for sclose = 7, serode = 3 and τ = 0 . . . . 53 67 List of Algorithms 5.1 Find Connected Components . . . . . . . . . . . . . . . . . . . . . . . . 24 6.1 Graph-Based Checkerboard Indexing . . . . . . . . . . . . . . . . . . . 29 69 Bibliography [AF05] Steffen Abraham and Wolfgang Förstner. Fish-eye-stereo calibration and epipolar rectification. ISPRS Journal of Photogrammetry and Remote Sensing, 59(5):278–288, 2005. [AV07] David Arthur and Sergei Vassilvitskii. K-means++: the advantages of careful seeding. In In Proceedings of the 18th Annual ACM-SIAM Sym- posium on Discrete Algorithms, page 1027–1035. Society for Industrial and Applied Mathematics Philadelphia, 2007. [BBV01] C. Bräuer-Burchardt and K. Voss. A new algorithm to correct fish-eye- and strong wide-angle-lens-distortion from single images. Proceedings 2001 International Conference on Image Processing 2001, Vol.1, pp.225- 228, 2001. [BHK16] Yunsu Bok, Hyowon Ha, and In So Kweon. Automated checkerboard detection and indexing using circular boundaries. Pattern Recogn. Lett., 71(C):66–72, February 2016. [BHKB16] Majid Banaeyan, Hanna Huber, Walter Kropatsch, and Raphael Barth. A novel concept for smart camera image stitching. Proceedings of the 21st Computer Vision Winter Workshop, 2016. [BL03] M. Brown and D. G. Lowe. Recognising panoramas. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV ’03, pages 1218–1225, Washington, DC, USA, 2003. IEEE Computer Society. [BL07] Matthew Brown and David G. Lowe. Automatic panoramic image stitching using invariant features. Int. J. Comput. Vision, 74(1):59–73, August 2007. [BTVG06] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In Proceedings of the 9th European Conference on Computer Vision - Volume Part I, ECCV’06, pages 404–417, Berlin, Heidelberg, 2006. Springer-Verlag. 71 [Cer15] Martin Cerman. Structurally correct image segmentation using local binary patterns and the combinatorial pyramid. Master’s thesis, PRIP, TU Wien, 2015. [FB81] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981. [FH04] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient graph- based image segmentation. Int. J. Comput. Vision, 59(2):167–181, September 2004. [Fit01] A.W. Fitzgibbon. Simultaneous linear estimation of multiple view geometry and lens distortion. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2001, Vol.1, pp.I-I, 2001. [FLDC14] Miguel Alemán Flores, Luis Álvarez León, Luis Gómez Déniz, and Daniel Elías Santana Cedrés. Automatic Lens Distortion Correction Using One-Parameter Division Models. IPOL Image Processing OnLine (Special Issue on Lens Distortion Models), 4:327–343, 2014. [GMCS12] Andreas Geiger, Frank Moosmann, Omer Car, and Bernhard Schuster. Automatic camera and range sensor calibration using a single shot. In International Conference on Robotics and Automation (ICRA), pages 3936–3943, St. Paul, USA, May 2012. IEEE. [HGJD08] Ciarán Hughes, Martin Glavin, Edward Jones, and Patrick Denny. Review of geometric distortion compensation in fish-eye cameras. In IET Irish Signals and Systems Conference, 208. (ISSC 2008), pages 162–167, 2008. [HRM06] G. Heusch, Y. Rodriguez, and S. Marcel. Local binary patterns as an image preprocessing for face authentication. In 7th International Conference on Automatic Face and Gesture Recognition 2006, pages 6–14, USA, 2006. IEEE. [HS88] Chris Harris and Mike Stephens. A combined corner and edge detector. In In Proc. of the 4th Alvey Vision Conference, volume 15, pages 147–151, 1988. [JK16] Ines Janusch andWalter G. Kropatsch. Shape classification according to LBP persistence of critical points. In Discrete Geometry for Computer Imagery - 19th IAPR International Conference, DGCI 2016, Nantes, France, April 18-20, 2016. Proceedings, volume 9647 of LNCS, pages 166–177. Springer, 2016. 72 [Kal09] Michael Kaltenbäck. Analysis 1, 2009. TU Wien, Lecture Notes, p. 6. [KB06] Juho Kannala and Sami S. Brandt. A generic camera model and cali- bration method for conventional, wide-angle, and fish-eye lenses. IEEE TRANS. PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 28:1335–1340, 2006. [KF08] Michal Kedzierski and Anna Fryskowska. Precise method of fisheye lens calibration. In Proceedings of the ISPRS-Congress, pages 765–768. International Society for Photogrammetry and Remote Sensing, 2008. [Kle14] Reinhard Klette. Concise Computer Vision - An Introduction into Theory and Algorithms. Undergraduate Topics in Computer Science. Springer, 2014. [KS04] Yan Ke and Rahul Sukthankar. PCA-SIFT: A more distinctive rep- resentation for local image descriptors. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’04, pages 506–513, Washington, DC, USA, 2004. IEEE Computer Society. [LGAGVGAV13] Fernando López-García, Gabriela Andreu-García, José-Miguel Valiente- Gonzalez, and Vicente Atienza-Vanacloig. Detection of visual defects in citrus fruits: Multivariate image analysis vs graph image segmentation. In Richard Wilson, Edwin Hancock, Adrian Bors, and William Smith, editors, Computer Analysis of Images and Patterns, volume 8047 of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pages 237–244, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. [LQS+15] Qiaoliang Li, Suwen Qi, Yuanyuan Shen, Dong Ni, Huisheng Zhang, and Tianfu Wang. Multispectral image alignment with nonlinear scale- invariant keypoint and enhanced local feature matrix. IEEE Geoscience and Remore Sensing Letters, 12(7):1551 – 1555, 2015. [MKS+17] Gianluigi Mucciolo, Barbara Koneczny, Alessia Saggese, Nicole M. Art- ner, Walter G. Kropatsch, and Mario Vento. Analysis and calibration of a mirror setup producing mirror-reflected, multi-view videos. In Proceedings of the 22nd Computer Vision Winter Workshop. Pattern Recognition and Image Processing Group (PRIP) and PRIP club, 2017. [MO14] Lars Vidar Magnusson and Roland Olsson. Improving graph-based image segmentation using automatic programming. In Applications of Evolutionary Computation, volume 8602 of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pages 464–475, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg. 73 [OPH96] Timo Ojala, Matti Pietikäinen, and David Harwood. A compara- tive study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1):51–59, January 1996. [OS14] Ipek Oguz and Milan Sonka. Logismos-b: Layered optimal graph image segmentation of multiple objects and surfaces for the brain. IEEE Transactions on Medical Imaging, June 2014, Vol.33(6), pp.1220-1235, 2014. [PSHZ+15] F. Perazzi, A. Sorkine-Hornung, H. Zimmer, P. Kaufmann, O. Wang, S. Watson, and M. Gross. Panoramic video from unstructured camera arrays. In Comput. Graph. Forum, volume 34, pages 57–68. The Eurographs Association & John Wiley & Sons, Ltd., 2015. [Pur14] Werner Purgathofer. Introduction to visual computing, 2014. TU Wien, Lecture Notes, pp. 31-32. [Sch05] Ellen Schwalbe. Geometric modelling and calibration of fisheye lens camera systems. In Proceedings of the 2nd Panoramic Photogramme- try Workshop, Int. Archives of Photogrammetry, Remote Sensing and Spatial Information, volume 36, pages 5–8. ISPRS, 2005. [SL09] Marina Sokolova and Guy Lapalme. A systematic analysis of per- formance measures for classification tasks. In Inf. Process. Manage., volume 45, pages 427–437, Tarrytown, NY, USA, July 2009. Pergamon Press, Inc. [SS14] P. Srestasathiern and N. Soontranon. A novel camera calibration method for fish-eye lenses using line features. ISPRS International Archives of the Photogrammetry, Remote Sensing and Spatial Informa- tion Sciences, 40(3):327–332, August 2014. [Sze10] Richard Szeliski. Computer Vision: Algorithms and Applications. Springer-Verlag New York, Inc., New York, NY, USA, 1st edition, 2010. [ULH15] Steffen Urban, Jens Leitloff, and Stefan Hinz. Improved wide-angle, fisheye and omnidirectional camera calibration. {ISPRS} Journal of Photogrammetry and Remote Sensing, 108:72 – 79, 2015. [UPH07] Ranjith Unnikrishnan, Caroline Pantofaru, and Martial Hebert. Toward objective evaluation of image segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell., 29(6):929–944, June 2007. [WLaZ+15] Zhongli Wang, Hedong Liang, Xian Wu andYipeng Zhao, Baigen Cai, Chuanqi Tao, Zhiyi Zhang, Yinling Wang, Shanwen Li, Fengtian Huang, Shuangfu Fu, and Feng Zhang. A practical distortion correcting method 74 from fisheye image to perspective projection image. In Information and Automation, 2015 IEEE International Conference on, pages 1178 – 1183, 2015. [YHZ06] Xianghua Ying, Zhanyi Hu, and Hongbin Zha. Fisheye lenses calibration using straight-line spherical perspective projection constraint. In P. J. Narayanan, Shree K. Nayar, and Heung-Yeung Shum, editors, Asian Conference on Computer Vision, volume 3852 of Lecture Notes in Computer Science, pages 61–70. Springer, 2006. [ZL14] Fan Zhang and Feng Liu. Parallax-tolerant image stitching. In Proceed- ings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, pages 3262–3269, Washington, DC, USA, 2014. IEEE Computer Society. 75