Title: Semi-automatic annotation on image segmentation hierarchies
Language: English
Authors: Zankl, Georg Michael 
Qualification level: Diploma
Keywords: Bildsegmentierung; Semantische Bildsegmentierung; Objekterkennung; Interaktivität
image segmentation; semantic segmentation; object recognition; interactivity
Advisor: Haxhimusa, Yll
Issue Date: 2012
Number of Pages: 39
Qualification level: Diploma
In the field of object recognition in natural images, a variety of established tasks exist, which are focus of attention when it comes to comparing different methods, for example image seg- mentation, semantic image segmentation or object detection. Image segmentation is the task of grouping pixels in an image that belong to the same region or object. Semantic image seg- mentation is the task of assigning a semantic label to each pixel of the image. The semantic labels can be objects: for example car, person, building; or classes of areas in an image: sky, floor, vertical surface. Object detection is the task of predicting occurrence and position in an image, for example by determining a bounding box of the object. Traditional object recognition challenges have limitations such as ambiguity in more general contexts.
For example for a sin- gle natural image, there are often multiple image segmentations a human would consider to be correct, depending on the object that person is particularly interested in. We raise the question:
"Is there a different task, that overcomes these limitations?" As an example we propose the task of interactively assigning a semantic label to each segment of a segmentation hierarchy. The result can be represented as a stack of semantic segmentations, with an inclusion-relationship between segments of adjacent segmentations. The focus of this work is to provide a solution to this task and discuss advantages and problems that arise. The main disadvantage is that it is harder to obtain suitable ground-truth that consists of annotated segmentation hierarchies. Also the quality of underlying segmentation methods is, in general, sub-optimal for natural images. The main advantage is that the structure implied by the occurrence of labels in the ground-truth can be used to aid the user in labeling the segments of the hierarchy. We propose a framework that consists of a feedback loop, where a label prediction is provided by the framework and a human user may select one or more misclassified segments and assign the correct label. This process can be repeated until the user is satisfied. The prediction is done using a Conditional Random Field (CRF) that is modified so that we are able to condition the model on the segmen- tation hierarchy as well as the user input. The framework is evaluated on two distinct datasets by comparing its quality to a straight-forward baseline. The baseline consists of a single prediction step of the proposed framework followed by fully manual correction of the segments without new predictions. The results show a significant difference in quality, after several user inter- actions. For example after 20 user interactions the baseline adjusts 20 misclassified segments, while the CRF-based framework adjusts about 130 misclassified segments for the two datasets. This experiment illustrates the potential of structured prediction for the proposed task.
URI: https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-48478
Library ID: AC07814469
Organisation: E186 - Institut für Computergraphik und Algorithmen 
Publication Type: Thesis
Appears in Collections:Thesis

Files in this item:

Show full item record

Page view(s)

checked on Feb 18, 2021


checked on Feb 18, 2021

Google ScholarTM


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.