On the potential of structural decomposition of database and AI problems

Longo, Davide Mario

doi:10.34726/hss.2023.116747

Record link:

https://doi.org/10.34726/hss.2023.116747
http://hdl.handle.net/20.500.12708/189021

Title:

On the potential of structural decomposition of database and AI problems

Citation:

Longo, D. M. (2023). On the potential of structural decomposition of database and AI problems [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.116747

reposiTUm DOI:

10.34726/hss.2023.116747

CatalogPlus:

AC16968067

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Longo, Davide Mario

Advisor:

Gottlob, Georg

Organisational Unit:

E192 - Institut für Logic and Computation

Date (published):

2023

Number of Pages:

145

Keywords:

hypergraphs; hypertree width; structural decomposition methods; conjunctive queries; constraint satisfaction problems; planning; grounding

Abstract:

Database Theory and Artificial Intelligence are among the most far-reaching research areas in Computer Science. Although classical problems in these fields, like answering conjunctive queries (CQs) and solving constraint satisfaction problems (CSPs), arise naturally in a simple formulation, they are intractable. The reason lies in the intricacy of the structure of CQ and CSP instances, which is neatly representable by hypergraphs, i.e., a generalization of graphs in which edges can connect more than two vertices. Indeed, in contrast to polynomial-time solvable classes of instances with an underlying acyclic hypergraph, cyclic problem instances pose severe limitations to systems. The introduction of structural decomposition methods, such as generalized hypertree decompositions (GHDs), allowed the definition of larger classes of tractable CQ and CSP instances. These methods classify the degree of cyclicity of hypergraphs through the concept of width. The lower the hypergraph width, the more it resembles an acyclic hypergraph. The related instance is then easier to solve. However, computing low-width GHDs is a difficult task. While the theory of decompositions is well understood, the theoretical advantages of using decompositions have not yet been transferred into practice. This quest constitutes the central theme of this thesis.We open this work with a study on the computation of GHDs for tractable cases. GHDs are desirable because they provide lower widths than other decompositions. Even though previous work demonstrated that computing GHDs for hypergraphs having bounded intersection size is tractable, only a few interesting yet rudimentary algorithms have been proposed. For instance, while empirical tests to check if a GHD of width ≤ k of a hypergraph exists, no algorithm computing a GHD exists. One such test uses balanced separators, a well-known concept in graph theory, with promising results. Unfortunately, a theoretical underpinning of this method is missing to date. Therefore, we develop this test into a new algorithm, called BalSep, that, given a hypergraph H, outputs a GHD of H of width ≤ k if it exists. We thus prove that BalSep is sound and complete.We progress by asking if BalSep and other GHD algorithms can efficiently decompose hypergraphs from real-world CQs and CSPs. Relevant previous work led to the creation of HyperBench, a collection of hypergraphs of CQ and CSP instances originating from several applications’ domains. While the authors of HyperBench conclude that real-world instances are well suited for decomposition algorithms, we recognize that the composition of the HyperBench is not varied enough to sustain these claims in general terms. Indeed, this dataset is composed of only one-third of CQs. Moreover, most are simple SQL queries, to the detriment of more complex queries and other popular languages such as SPARQL. Since the transformation of these complex queries into hypergraphs presents us with a new challenge, we develop a novel methodology for extracting the “maximal conjunctive components” from a query that is not purely conjunctive. We thus extend HyperBench by including a more diversified sample of SPARQL and complex SQL queries. Experiments carried out on this extended dataset show that decomposition algorithms, and especially BalSep, manage to decompose even more complex CQs.We move forward into studying the update of GHDs in response to instancemodifications. In problems such as incremental constraint satisfaction, the solver repeatedly modifies the instance while computing a solution. In this case, a GHD-based solver must compute a new GHD every time the instance changes. Even though computing a low-width GHD is a worthwhile task, it is challenging. We respond to this challenge by formalizing the problem of updating a GHD in response to common hypergraph modifications. Unfortunately, we prove the problem to be as hard as computing a new decomposition anew. Yet, we develop a new algorithm that significantly speeds up the computation of a GHD of the updated task using knowledge of the old decomposition. A comparison against the naive algorithm computing a GHD from scratch reveals that not only our method has extremely high speed-up factors, but in most cases, it does not increase the width.Finally, we research using GHDs to ground classical planning tasks. These tasks are typically formulated in a first-order formalism. Since state-of-the-art planners efficiently solve propositional tasks, they ground the first-order representation before searching for a solution. Nevertheless, the grounding phase becomes a bottleneck as the tasks get larger and larger. Planning grounders work on a logic program representing a relaxed version of the task where no negation occurs. When tasks are large, they struggle to find good join plans to evaluate the rules of the logic program. This difficulty is due to the number of joins and the structure of these joins. Consequently, we develop a novel algorithm that, guided by a GHD, splits the original rule into smaller rules that are easier to ground. Empirical evaluation shows that rules decomposition significantly reduces the grounding time if we perform the split judiciously.

Additional information:

Zusammenfassung in deutscher Sprache

License:

In Copyright

Appears in Collections:

Thesis