Berger, G. (2019). Static analysis for ontology-mediated querying [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.67802
Ontology-based data access (OBDA) is the marriage of database technology with ontological theories that provide a unified conceptual view on multiple, possibly heterogeneous data sources. Ontological theories are presented as logical assertions that are capable of deriving new knowledge from extensional data. User queries are thus answered against a logical theory put on top of the data. In OBDA, it is common to view a user query together with an ontological theory as a composite object, which is coined ontology-mediated query (OMQ). Ontological theories can, in principle, be formulated using any available logical formalism. However, it is widely acknowledged that tuple-generating dependencies (TGDs) are a convenient formalism for formulating many ontologies that occur in real-life scenarios. Ontology-mediated queries based on TGDs are the main objects of study in this thesis. While the evaluation problem for such queries has received much attention in the literature, this thesis is dedicated to the study of another important family of computational tasks for OMQs, namely so-called static analysis tasks. These tasks are data-independent computational problems that can be employed for query optimization. As a first contribution, we study the query containment problem for many of the well-known ontology-mediated query languages based on TGDs. More specifically, we study containment for classes of OMQs formulated via (frontier-)guarded, non-recursive, and sticky sets of TGDs. We develop specifically tailored techniques for deciding the containment problems for each of them, allowing us to obtain sharp complexity bounds. We then focus our attention on the first-order rewritability problem for OMQs, that is, the problem asking whether or not a given OMQ can be equivalently expressed as a plain first-order query. For this second contribution, we study first-order rewritability for OMQs based on (frontier-)guarded TGDs, and we pinpoint its exact complexity by using the sophisticated model of cost automata on trees. A recurring theme for static analysis tasks is that many of them are already undecidable when one focuses on Datalog queries, and this holds true for containment and first-order rewritability. As a third contribution, we investigate the relationship between OMQs based on TGDs and the existence of rewritings into Datalog queries. We develop a strong sufficient criterion for OMQs to be expressible as Datalog queries, and we show that OMQs in the language of Vadalog a commercially employed reasoning engine for knowledge graphs are always Datalog rewritable. We furthermore identify a spaceefficient fragment of the language of Vadalog that restricts the use of recursion, but still allows to express many queries occurring in real-life scenarios.