Intelligent video annotation and retrieval techniques

Sorschag, Robert

DC Element

Wert

Sprache

dc.contributor.advisor

Eidenberger, Horst

dc.contributor.author

Sorschag, Robert

dc.date.accessioned

2020-06-30T16:01:50Z

dc.date.issued

2012

dc.date.submitted

2012-09

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Sorschag, R. (2012). <i>Intelligent video annotation and retrieval techniques</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-50967</div> </div>

dc.identifier.uri

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-50967

dc.identifier.uri

http://hdl.handle.net/20.500.12708/12947

dc.description

Zsfassung in dt. Sprache

dc.description.abstract

Videos sind ein wesentlicher Bestandteil moderner Informationssysteme und des Webs. Seit der Einführung der ersten Videoportale Mitte des letzten Jahrzehnts gibt es ein stetiges Wachstum der verfügbaren Videos und damit einhergehend die Notwendigkeit effizienterer Videosuche. Aktuelle Suchsysteme arbeiten hauptsächlich auf manuell erzeugten Metadaten, die den Nachteil haben, dass sie Videoinhalte oft nur grob und ungenau beschreiben. eswegen sollen Videoannotationssysteme, die auf inhaltsbasierte Analyse setzen, Abhilfe schaffen und die Videosuche auf ein ähnliches Niveau bringen, wie man es heute von der Online-Suche nach Textdokumenten und Webseiten gewohnt ist.<br />Die vorliegende Dissertation beschäftigt sich mit der Verwendung automatischer Objekterkennung für die Annotation von Personen, Objekten und Orten. Nach der Beschlagwortung können Videoszenen dieser Objekte mit Google-ähnlichen Suchanfragen gefunden werden. Durch eine ausgeklügelte Präsentation der gefundenen Videoszenen wird die Relevanz einzelner Suchresultate sofort sichtbar. Die vorgestellten Annoationstechniken basieren auf einem neuen Objekterkennungs-Framework, das in verschiedenste Videoumgebungen eingebunden werden kann und Objekterkennung mit einer flexiblen Verwendung von visuellen Features, Vergleichsalgorithmen und Techniken des maschinellen Lernens ermöglicht.<br />Neue Methoden können mit geringem Entwicklungsaufwand in dieses Framework integriert werden. Dies erlaubt eine schnelle Verwendung neuer Entwicklungen und kann deswegen speziell für zukünftige Forschungen einen wichtigen Beitrag leisten. Desweiteren bietet das Framework eine automatische Konfigurationsauswahl die es möglich macht verschiedene Algorithmen für die Annotation von verschiedenen Objekten zu verwenden.<br />Die Unterstützung verteilter Computersysteme und das kompakte Speichern der erzeugten Daten gewährleisten außerdem hohe Effizienz.<br />Im Laufe des Projektes wurde mit den vorgestellten Techniken ein Videoannotations-Prototyp Entwickelt um eine umfassende Fallstudie durchzuführen.Dieser Prototyp ist ebenso wie die verwendeten Videodaten und einige der resultierenden Publikationen öffentlich verfügbar.<br />Weitere wissenschaftliche Beiträge der Dissertation behandeln bewegungsbasierte und segmentations-basierte Features, welche für spezielle Einsatzgebiete wie die automatische Actionszenenerkennung geeignet sind. Zwischen 2010 und 2012 haben wir darüberhinaus bei TRECVID, dem größten internationalen Wettbewerb für inhaltsbasierte Videosuche, teilgenommen und dabei vielversprechende Ergebnisse erzielt.<br />

dc.description.abstract

Videos are an integral part of current information technologies and the web. The demand for efficient retrieval rises with the increasing number of videos, and thus better annotation tools are needed as today's retrieval systems mainly rely on manually generated metadata.<br />The situation is even more critical when it comes to user-generated videos where rough and inaccurate annotations are the common practice.<br />Attempts to employ content-based analysis for video annotation and retrieval already exist, but they are still in an infant stage compared to the retrieval of web documents.<br />In this work, we address the use of object recognition techniques to annotate what is shown where in videos. These annotations are suitable to retrieve specific video scenes for object related text queries, thought the manual generation of such metadata would be impractical and expensive. A sophisticated presentation of the retrieval results is further exploited that indicates the relevance of the retrieved scenes at a first glance. The presented semi-automatic annotation approach can be used in an easy and comfortable way, and it builds on a novel framework with following outstanding features. First, it can be easily integrated into existing video environments. Second, it is not based on a fixed analysis chain but on an extensive recognition infrastructure that can be used with all kinds of visual features, matching and machine learning techniques. New recognition approaches can be integrated into this infrastructure with low development costs and a configuration of the used recognition approaches can be performed even on a running system. Thus, this framework might also benefit from future advances in computer vision. Third, we present an automatic selection approach to support the use of different recognition strategies for the annotation of different objects. Moreover, visual analysis can be performed efficiently on distributed, multi-processor environments and the resulting video annotations and low-level features can be stored in a compact form.<br />We demonstrate the proposed annotation approach in an extensive case study with promising results. A video object annotation prototype as well as the generated scene classification ground-truth are freely available to foster reproducible research. Additional contributions of this work consider the generation of motion-based and segmentation-based features and their use for specific annotation tasks, such as the detection of action scenes in professional and user-generated video.<br />Furthermore, we participated at the two tasks instance search and semantic indexing of the TRECVID challenge in the three consecutive years 2010, 2011, and 2012.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Videoanalyse

dc.subject

Objekterkennung

dc.subject

Visuelle Bildeigenschaften

dc.subject

Automatische Beschlagwortung

dc.subject

Videosuchmaschinen

dc.subject

content-based video analysis

dc.subject

object recognition

dc.subject

visual features

dc.subject

automatic annotation

dc.subject

video search engines

dc.title

Intelligent video annotation and retrieval techniques

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Robert Sorschag

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Scherp, Ansgar

tuw.publication.orgunit

E188 - Institut für Softwaretechnik und Interaktive Systeme

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC07814297

dc.description.numberOfPages

146

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-50967

dc.thesistype

Dissertation

dc.thesistype

Dissertation

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

external

item.languageiso639-1

item.openairetype

doctoral thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.openaccessfulltext

Open Access

crisitem.author.dept

E188 - Institut für Softwaretechnik und Interaktive Systeme

crisitem.author.parentorg

E180 - Fakultät für Informatik

Enthalten in den Sammlungen:

Thesis

Volltext (Version of Record (published version))

Adobe PDF

(4.49 MB)

Urheberrechtsschutz

Zur Kurzanzeige

Seiten Aufrufe

580

aufgerufen am 23.11.2023

Download(s)

208

aufgerufen am 23.11.2023

Google Scholar^TM

Check

Seiten Aufrufe

Download(s)

Google ScholarTM

Google Scholar^TM