Llugiqi, M. (2022). Improving learned decision trees with domain ontologies [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.99442
Artificial Intelligence (AI) systems often build on machine learning techniques that learn from a dataset in isolation, without relying on any common sense knowledge, or any knowledge of the domain other than what is explicitly reflected as 'features' in the data. Despite the vast amount of expert knowledge that exists for the domain, and the fact that much of it is readily usable in existing ontologies, usually not even a small fragment of it is used.In this thesis, we investigate how domain knowledge, especially medical ontologies, might be used to improve decision tree learning. In particular, we assess techniques both for constructing decision trees directly from data, as well as constructing decision trees as approximation of a neural network extracted with the Trepan algorithm, that can be enhanced with some measures computed from ontologies. We select and analyze some existing approaches from the literature, and also propose some variations. Moreover, we asses the impact of such ontological measures on both the understandability and the accuracy of the resulting trees. For evaluating the understandability, beside two syntactic complexity measures that we calculate, we also perform three user questionnaires depending on the domain, with four different tasks and evaluate the results in terms of time response, correctness, confidence on the answers, as well as the users’ perception of the understandability of the trees.For the comparison of the approaches we create a test set of seven medical datasets paired with topic-specific ontologies extracted from reliable real-life medical ontology repositories. Given the lack of benchmarks linking machine learning datasets and ontologies, the construction of such a test set is important on its own.The results of our experiments show that incorporating heuristics from ontologies into the decision tree building process moderately improves the accuracy of decision trees for most of the domains we use, particularly for decision trees extracted from a neural network using Trepan.Furthermore, based on the findings of the user-based surveys, we observe that users, on average, find the tree built with our modified version of hubscore called relevance-score from the SNOMED-CT ontology slightly easier to understand; users are more confident in their answers concerning these trees, and give a higher proportion of correct answers.