Ďurčo, M. (2013). SMC4LRT - semantic mapping component for language resources and technology [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2013.22820
E188 - Institut für Softwaretechnik und Interaktive Systeme
-
Date (published):
2013
-
Number of Pages:
137
-
Abstract:
The ultimate objective of this work was to enhance search functionality over a large heterogeneous collection of resource descriptions. This objective was pursued in two separate, complementary approaches: a) design a service delivering crosswalks (i.e. equivalences between fields in disparate metadata formats) based on well-defined concepts and apply this concept-based crosswalks in search scenarios to enhance recall. b) acknowledging the integrative power of the Linked Open Data paradigm, express the domain data as a Semantic Web resource, to enable the application of semantic technologies on the dataset. In parallel with the two approaches, the work delivered two main results: a) the specification of the module for concept-based search together with the underlying crosswalks service accompanied by a proof-of-concept implementation. And b) the blueprint for expressing the original dataset in RDF format, effectively laying a foundation for providing this dataset as Linked Open Data. Partly as by-product, the application SMC browser was developed - an interactive visualization tool to explore the dataset on the schema level. This tool provided means to generate a number of advanced analyses of the data, directly used by the community for exploration and curation of the complex dataset. As such, the tool and the reports can be considered a valuable contribution to the community. This work is embedded in the context of a large research infrastructure initiative aimed at providing easy, stable, harmonized access to language resources and technology (LRT) in Europe, the Common Language Resource and Technology Infrastructure or CLARIN. A core technical pillar of this initiative is the Component Metadata Infrastructure, a distributed system for creating and providing metadata for LRT in a coherent harmonized way. The outcome of this work, the Semantic Mapping Component, was conceived as one module within the infrastructure dedicated to overcome the semantic interoperability problem stemming from the heterogeneity of the resource descriptions, by harnessing the mechanisms of the semantic layer built-in into the core of the infrastructure.