Title: Model-driven Benchmark data generation for digital preservation of webpages
Other Titles: Model-Driven Benchmark Data Generation for Digital Preservation of Webpages
Language: English
Authors: Sauerwein, Clemens 
Qualification level: Diploma
Advisor: Rauber, Andreas 
Issue Date: 2013
Number of Pages: 68
Qualification level: Diploma
Abstract: 
Digital Preservation (DP) is the process of keeping digital information accessible and usable in an authentic manner for a long term. Preservation activities are used to guarantee long term and error free accessibility of data regardless of technological change. Different approaches based on continuous transformation of data are used to perform these preservation activities. Several tools exist for the execution of these activities. Digital objects have significant properties which must be preserved during the transformations. To evaluate these preservation activities information about these characteristics (e.g. structure, size) are necessary. The annotations of digital objects with this information are used as ground truth. A benchmark data set can be formed with real world data but the verification of the properties has to be done manually. Every automatic analysis is based on the correct interpretation of an analysis program (e.g. characterization tool). Due to the fact that these programs must be evaluated there is a profound lack of annotated benchmark data in Digital Preservation. For this reason the evaluation and improvement of digital preservation approaches and tools is hindered. This thesis introduces a model driven benchmark data generation framework with the purpose of automatic generation of benchmark data with corresponding ground truth. The system uses the Model Driven Architecture (MDA) as underlying concept which facilitates the usage of well-known model driven engineering tools and frameworks. Instead of analyzing existing benchmark data collections of computer science it generates the benchmark data sets referred to property distributions of different kinds of documents (e.g. webpages). The framework specifies ground truths for the Platform Independent and Platform Specific Models of the generated benchmark data. These ground truths together with the benchmark data are used for evaluation. The model driven benchmark data generation framework is evaluated by generating benchmark data for testing preservation action tools for web pages. They are widely used and a complex challenge in digital preservation settings. We define a Platform Independent and a Platform Specific Model for representing webpages and demonstrate how benchmark data can be created with these.
URI: https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-64614
http://hdl.handle.net/20.500.12708/2542
Library ID: AC11320779
Organisation: E188 - Institut für Softwaretechnik und Interaktive Systeme 
Publication Type: Thesis
Hochschulschrift
Appears in Collections:Thesis

Files in this item:

Show full item record

Page view(s)

15
checked on Feb 18, 2021

Download(s)

58
checked on Feb 18, 2021

Google ScholarTM

Check


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.