A model driven approach to generate ground truth data for digital preservation

Mezensky, Harald

doi:10.34726/hss.2013.23019

DC Field

Value

Language

dc.contributor.advisor

Rauber, Andreas

dc.contributor.author

Mezensky, Harald

dc.date.accessioned

2020-06-28T02:10:23Z

dc.date.issued

2013

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Mezensky, H. (2013). <i>A model driven approach to generate ground truth data for digital preservation</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2013.23019</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2013.23019

dc.identifier.uri

http://hdl.handle.net/20.500.12708/2467

dc.description

Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers

dc.description

Zsfassung in dt. Sprache

dc.description.abstract

Nowadays a large percentage of the data that contains our cultural and scientific heritage is only available in digital form and the amount of digital information is growing rapidly every day. Since digital artifacts are often depending on specific hardware and software, but environments are subject to changes and might evolve or become obsolete, there is a stringent necessity to develop techniques that provide solutions for this problem. Digital preservation aims at providing these solutions by introducing various activities to enable long term accessibility and interoperability of digital objects over time and changing environments. One of these strategies is the migration of data, which includes the transformation of a potentially obsolete file format into a more current, and more sustainable format. To ensure that the content of the new file is at least similar to the old version, digital preservation provides characterisation tools. These tools are able to analyze the properties of a file and therefore allow comparison of two different versions, in our case the object before and after the migration process. To introduce quality assurance activities for characterisation tools it is mandatory to have access to meaningful benchmark data that can be used as input for actions such as black box tests. This benchmark data must rely on a known ground truth to be able to verify the result of executed test runs. Currently, however, no such data exists and therefore improvements of existing tools, or development of new ones, can only be performed in an unsystematic way. This thesis presents a solution to that problem by providing a model driven concept that fills the gap between digital preservation and quality assurance. This approach is based on the idea to design a digital object by defining a model representation of it. More precisely, a platform independent source model is created that relies on a particular metamodel and contains all elements and attributes that are necessary to describe the digital artifact. This model is used as input for a model transformation step where out of one source model, multiple models are generated by diversifying the properties of the model. The motivation behind that step is to bring variety into the ground truth and create a collection of benchmark data that is similar to a real world collection regarding distribution of properties. This can be accomplished by using statistical data as input for the diversification step. For each diversified model again a model transformation is performed to transform the platform independent model into a platform specific model. In this way, benchmark data for multiple platforms can be generated within one run. The models are finally transformed into the actual output or into executable code that can be used to generate actual output by means of code generation. Additionally to each generated file a corresponding ground truth file is created, capturing the actual properties of the object. The thesis contains an evaluation of the current status in these areas and provides a detailed description of the underlying concept the approach is based on. In order to prove the feasibility of the concept, a reference implementation has been developed. This prototype is also documented in this thesis, including the base for decisions regarding used technologies, as well as emerged problems. The implementation is used for various experiments, which are described including analysis of the results and the performance of its execution. Finally, an outlook for further work as well as details about applications and limitations of the current approach are given.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.title

A model driven approach to generate ground truth data for digital preservation

dc.title.alternative

Modell-getriebener Ansatz zur Erstellung von Ground-Truth Daten für Digitale Langzeitarchivierung

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2013.23019

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Harald Mezensky

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E188 - Institut für Softwaretechnik und Interaktive Systeme

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC11200036

dc.description.numberOfPages

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-67456

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.advisor.orcid

0000-0002-9272-6225

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

E183 - Institut für Rechnergestützte Automation

crisitem.author.parentorg

E180 - Fakultät für Informatik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(2.02 MB)

In Copyright

Show simple item record

Page view(s)

323

checked on Nov 23, 2023

Download(s)

146

checked on Nov 23, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM