A light-weight processing model for interactive Web information acquisition

Zigo, Viktor

DC Field

Value

Language

dc.contributor.advisor

Gottlob, Georg

dc.contributor.author

Zigo, Viktor

dc.date.accessioned

2023-06-18T11:41:13Z

dc.date.issued

2006

dc.date.submitted

2006-12

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Zigo, V. (2006). <i>A light-weight processing model for interactive Web information acquisition</i> [Dissertation, Technische Universität Wien]. reposiTUm. http://hdl.handle.net/20.500.12708/180682</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/180682

dc.description

Zsfassung in dt. Sprache

dc.description.abstract

Die richtige Menge und Kombination der gesuchten Information zum gewünschten Zeitpunkt zu erhalten, ermöglicht zweifelsohne einen deutlichen Vorsprung. Das Web, die größte Datensammlung, ist zu einer unvermeidbaren Quelle der gezielten Akquisition von Daten geworden.<br />Natürlich ist diese Quelle nicht kostenlos und out-of-the-box verfügbar.<br />Benötigt werden Werkzeuge zur Datensuche, -erfassung, -umsetzung und Integration von Dateien unterschiedlicher Dienstleistungsanbieter, die außerdem noch semantisch organisiert und vor allem personalisiert werden müssen. Wenn die Information fortlaufend aktualisiert dargestellt werden soll, müssen alle Prozesse auch wiederholbar sein. Leider sind die relevanten Datenquellen vielfältig und heterogen, und der Charakter der Strukturierung in den Quellen der Web-Information für eine Automatisierung unbrauchbar.<br />Die Problematik der Datenextraktion aus statischen, semi-strukturierten Quellen (z.b. HTML Seiten), ist von der Forschung bereits viel behandelt worden. Diese Arbeit beschäftigt sich vielmehr mit den zusätzlichen, verhaltensbedingten Aspekten der Datenextraktion (Web-Navigation und -Interaktion) sowie auch mit den Aspekten der Informationsakquisition - aufbauend auf Prozess Modellen.<br />Das Haupt- und innovativste Teil der Dissertation spezifisiert ein leichgewichtiges, jedoch äußerst dynamisches und flexibles interaktives Webinformationsakquisitions Prozess-Framework. Wir haben unsere Arbeit auf zwei der Hauptkomponenten des Systems konzentriert: ein Aufgabenkoordinierungs-Workflow und ein Modell für Datenausgabe. Die vorgeschlagenen Modelle wurden erfolgreich in mehreren Anwendungen implementiert. Wir stellen das Innovativste vor: LumberJaczk - ein Prototyp eines vollständigen leichtgewichtigen Systems für "on-top-of-Web" Anwendungen, d.h. clientseitig ausführbare und sehr anpassungsfähige Anwendungen, die das Web als "Back-end" verwenden.<br />Der vorgeschlagene Ansatz bringt mehrere herausfordernde Bereiche zum Vorschein, z.B: dezentralisierte Informationsakquisition, kollaborative semantische Datenbanken.<br />

dc.description.abstract

Having the right amount and combination of the right information, available in the right time is an unquestionable advantage.<br />The Web, being the largest information database, has become an unavoidable source for acquiring the knowledge of these three "rights".<br />However, it is not available for free. It requires non-trivial data search, capture, transformation and combination of data from many unrelated services, their semantic organization, and personalization.<br />Such processes need be automatized and repeatable for maintaining the knowledge up-to-date. Unfortunately, the extent of relevant data sources is immense and heterogeneous and the nature of the structuring of the Web information is not suitable for automatization. The information is encoded in visual structures (e.g. HTML pages), intended for humans.<br />The problem of information extraction from static semi-structured sources has been the focus of the recent research. In our work, we rather aimed the complementary, behavioral aspects of information extraction (Web navigation and interaction) and the aspects of information acquisition processing models.<br />The main contribution is the specification of a light-weight, dynamic and agile interactive Web information acquisition processing framework.<br />We particularly focused on two main components of the system: a task coordination workflow and a model for data outputting. The proposed models have been successfully implemented and applied in applications.<br />We present the most innovative one: LumberJaczk - a light-weight system for on-top-of-Web applications, i.e. client-side, portable and very personalizable applications reusing the Web.<br />The proposed approach reveals several new challenging fields, such as:<br />decentralized information acquisition, collaborative semantic databases.<br />

dc.language

English

dc.language.iso

dc.subject

Informationsakquisition

dc.subject

Web

dc.subject

Webdatenextraktion

dc.subject

Informationsextraktion

dc.subject

Workflow

dc.subject

Datenfluss

dc.subject

Metasuche

dc.subject

LumberJaczk

dc.subject

information acquisition

dc.subject

Web

dc.subject

information extraction

dc.subject

workflow

dc.subject

dataflow

dc.subject

meta-search

dc.subject

LumberJaczk

dc.title

A light-weight processing model for interactive Web information acquisition

dc.type

Thesis

dc.type

Hochschulschrift

dc.contributor.affiliation

TU Wien, Österreich

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Kappel, Gerti

tuw.publication.orgunit

E184 - Institut für Informationssysteme

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC05033385

dc.description.numberOfPages

245

dc.thesistype

Dissertation

dc.thesistype

Dissertation

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.assistant.orcid

0000-0002-4758-9436

item.languageiso639-1

item.openairetype

doctoral thesis

item.grantfulltext

none

item.fulltext

no Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_db06

crisitem.author.dept

E184 - Institut für Informationssysteme

crisitem.author.parentorg

E180 - Fakultät für Informatik

Appears in Collections:

Thesis

Show simple item record

Page view(s)

155

checked on Nov 23, 2023

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM