Prock, A. (2021). Hybrid Human-Machine ontology verification : Identifying common errors in ontologies by integrating human computation with ontology reasoners [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.85884
Ontologies are a type of semantic resource, which are utilized in knowledge-based artificial intelligence systems, and can be seen as schemata for knowledge graphs, which are used to integrate data and knowledge, e.g. in the Semantic Web. Defects in ontologies can therefore cause systems based on either them, or knowledge graphs, to fail or to produce incorrect output, thus defects in ontologies may have very expensive consequences, implying the necessity of ontology verification. While several types of ontology defects can be identified through automatic (reasoning) algorithms, often additional human-based ontology verification is required. This is mostly achieved through batch processes using Human Computation (HC) and Crowdsourcing techniques, which however are not efficient and do not scale well. This thesis proposes a cost-effective and more scalable method for identifying common modeling errors in ontologies, using a two-step hybrid human-machine verification process. In the first step, this process facilitates an ontology reasoner together with specifically designed heuristics to automatically detect defect candidates. These defect candidates are then verified by human workers in the second step using HC and Crowdsourcing techniques. The automatic first step performs a preselection of classes or class combinations that are likely to contain errors, so-called “bad smells”, reducing the amount of human labor needed. This thesis makes the following contributions: (i) the concept of hybrid human-machine workflows for identifying specific types of ontology modeling errors, called Defect Identification Workflows, (ii) an HC task design suitable for collecting human judgement in these workflows, (iii) heuristics for detecting defect candidates for four selected error types, (iv) a study design for evaluating the proposed approach, and (v) insights on factors that influence the effectiveness of the approach. To make these contributions, a literature review is conducted, the methods of algorithm and HC task design, prototyping and study design are applied, the designed empirical study is executed, and subsequent data analysis is performed based on descriptive statistics. The evaluation of this novel approach, using the prototype, focuses on the HC part, where the empirical study shows that 80.9 percent of the seeded modeling errors and false positives are correctly identified by human workers. Analyzing the evaluation results, influences of the error type present in a task and the qualification of the human verifiers on the verification performance are observed. Furthermore, it is shown that aggregating multiple answers via majority voting significantly improves the verification performance.