Mezensky, H. (2013). A model driven approach to generate ground truth data for digital preservation [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2013.23019
E188 - Institut für Softwaretechnik und Interaktive Systeme
-
Date (published):
2013
-
Number of Pages:
83
-
Abstract:
Nowadays a large percentage of the data that contains our cultural and scientific heritage is only available in digital form and the amount of digital information is growing rapidly every day. Since digital artifacts are often depending on specific hardware and software, but environments are subject to changes and might evolve or become obsolete, there is a stringent necessity to develop techniques that provide solutions for this problem. Digital preservation aims at providing these solutions by introducing various activities to enable long term accessibility and interoperability of digital objects over time and changing environments. One of these strategies is the migration of data, which includes the transformation of a potentially obsolete file format into a more current, and more sustainable format. To ensure that the content of the new file is at least similar to the old version, digital preservation provides characterisation tools. These tools are able to analyze the properties of a file and therefore allow comparison of two different versions, in our case the object before and after the migration process. To introduce quality assurance activities for characterisation tools it is mandatory to have access to meaningful benchmark data that can be used as input for actions such as black box tests. This benchmark data must rely on a known ground truth to be able to verify the result of executed test runs. Currently, however, no such data exists and therefore improvements of existing tools, or development of new ones, can only be performed in an unsystematic way. This thesis presents a solution to that problem by providing a model driven concept that fills the gap between digital preservation and quality assurance. This approach is based on the idea to design a digital object by defining a model representation of it. More precisely, a platform independent source model is created that relies on a particular metamodel and contains all elements and attributes that are necessary to describe the digital artifact. This model is used as input for a model transformation step where out of one source model, multiple models are generated by diversifying the properties of the model. The motivation behind that step is to bring variety into the ground truth and create a collection of benchmark data that is similar to a real world collection regarding distribution of properties. This can be accomplished by using statistical data as input for the diversification step. For each diversified model again a model transformation is performed to transform the platform independent model into a platform specific model. In this way, benchmark data for multiple platforms can be generated within one run. The models are finally transformed into the actual output or into executable code that can be used to generate actual output by means of code generation. Additionally to each generated file a corresponding ground truth file is created, capturing the actual properties of the object. The thesis contains an evaluation of the current status in these areas and provides a detailed description of the underlying concept the approach is based on. In order to prove the feasibility of the concept, a reference implementation has been developed. This prototype is also documented in this thesis, including the base for decisions regarding used technologies, as well as emerged problems. The implementation is used for various experiments, which are described including analysis of the results and the performance of its execution. Finally, an outlook for further work as well as details about applications and limitations of the current approach are given.
en
Additional information:
Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers Zsfassung in dt. Sprache