Gamerith, S. (2019). Context enrichment of crowdsourcing tasks for ontology validation [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.51125
E188 - Institut für Softwaretechnik und Interaktive Systeme
-
Date (published):
2019
-
Number of Pages:
86
-
Keywords:
Ontology Validation, Crowdsourcing, Context
en
Abstract:
Validating the relevance of ontologies is considered an important task in the Semantic Web Lifecycle. This holds especially for learned ontologies which contain quite naturally errors. Although many errors can be tackled algorithmically, solving more complex problems by machines can be very tricky. Crowdsourcing offers a cost effective alternative in which tasks are solved by a large group of human workers. However, the performance of existing approaches that combine ontology validation with Crowdsourcing is still not satisfying. A promising way of tackling this problem is to enrich Crowdsourcing tasks with additional contextual information to improve their understanding. This Context has not only a positive impact on the crowds performance but also raises the results quality. Even though recent research showed advances in this area, the use of Context was not explicitly targeted. In this thesis we present three novel methods that enrich Crowdsourcing tasks with contextual information to validate the relevance of concepts for a particular domain of interest. First, the Ontology based Approach processes hierarchical relations. Second, the Metadata based Approach generates descriptions based on annotations that are encoded within the ontology. Third, the idea of the Dictionary based Approach is to build up contextual information from example sentences by consulting the online dictionary WordNik. For the analysis of all three approaches, we integrated these into the existing uComp Protege Plugin which facilitates the creation and execution of crowdsourcing tasks for ontology validation from within the Protege ontology editor. The evaluation was performed on three ontologies covering the domains of climate change, tennis and finance. For each dataset, the performance metrics Precision, Recall and F-Measure were calculated to compare the methods against the existing baseline approach that used no contextual information. The results showed that the Metadata based Approach outperformed all other methods. The other two approaches had some difficulties in certain situations, for example the Dictionary based Approach sometimes added inappropriate explanations, especially for concepts with multiple meanings associated. Likewise, the Ontology based Approach had problems with loosely connected ontologies containing just a few subsumption relations. However, all three approaches delivered results of high quality (F-Measure above 80%), indicating that adding Context to Crowdsourcing tasks is a cost-effective method of improving the crowds performance.