|Title:||Model-driven Benchmark data generation for digital preservation of webpages||Other Titles:||Model-Driven Benchmark Data Generation for Digital Preservation of Webpages||Language:||English||Authors:||Sauerwein, Clemens||Qualification level:||Diploma||Advisor:||Rauber, Andreas||Issue Date:||2013||Number of Pages:||68||Qualification level:||Diploma||Abstract:||
Digital Preservation (DP) is the process of keeping digital information accessible and usable in an authentic manner for a long term. Preservation activities are used to guarantee long term and error free accessibility of data regardless of technological change. Different approaches based on continuous transformation of data are used to perform these preservation activities. Several tools exist for the execution of these activities. Digital objects have significant properties which must be preserved during the transformations. To evaluate these preservation activities information about these characteristics (e.g. structure, size) are necessary. The annotations of digital objects with this information are used as ground truth. A benchmark data set can be formed with real world data but the verification of the properties has to be done manually. Every automatic analysis is based on the correct interpretation of an analysis program (e.g. characterization tool). Due to the fact that these programs must be evaluated there is a profound lack of annotated benchmark data in Digital Preservation. For this reason the evaluation and improvement of digital preservation approaches and tools is hindered. This thesis introduces a model driven benchmark data generation framework with the purpose of automatic generation of benchmark data with corresponding ground truth. The system uses the Model Driven Architecture (MDA) as underlying concept which facilitates the usage of well-known model driven engineering tools and frameworks. Instead of analyzing existing benchmark data collections of computer science it generates the benchmark data sets referred to property distributions of different kinds of documents (e.g. webpages). The framework specifies ground truths for the Platform Independent and Platform Specific Models of the generated benchmark data. These ground truths together with the benchmark data are used for evaluation. The model driven benchmark data generation framework is evaluated by generating benchmark data for testing preservation action tools for web pages. They are widely used and a complex challenge in digital preservation settings. We define a Platform Independent and a Platform Specific Model for representing webpages and demonstrate how benchmark data can be created with these.
|Library ID:||AC11320779||Organisation:||E188 - Institut für Softwaretechnik und Interaktive Systeme||Publication Type:||Thesis
|Appears in Collections:||Thesis|
Show full item record
Files in this item:
|Sauerwein Clemens - 2013 - Model-driven Benchmark data generation for digital...pdf||1.69 MB||Adobe PDF|
checked on Feb 18, 2021
checked on Feb 18, 2021
Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.