Lightweight integration of query decomposition techniques into SQL-based database systems

Selzer, Alexander

doi:10.34726/hss.2021.81940

Record link:

https://doi.org/10.34726/hss.2021.81940
http://hdl.handle.net/20.500.12708/17070

Title:

Lightweight integration of query decomposition techniques into SQL-based database systems

Citation:

Selzer, A. (2021). Lightweight integration of query decomposition techniques into SQL-based database systems [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.81940

reposiTUm DOI:

10.34726/hss.2021.81940

CatalogPlus:

AC16169577

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Selzer, Alexander

Advisor:

Pichler, Reinhard

Organisational Unit:

E192 - Institut für Logic and Computation

Date (published):

2021

Number of Pages:

132

Keywords:

Datenbanken; Join Queries; Queryoptimierung; Hypertree Decompositions; Hypergraphs

Databases; Join Queries; Query-Optimization; Hypertree Decompositions; Hypergraphs

Abstract:

Die Berechnung der Ergebnisse von Join-Queries ist eine der zentralen Herausforderungen in jedem Datenbanksystem und hat direkte Auswirkungen auf die Geschwindigkeit sowie den Ressourcenverbrauch. Das Problem, SQL-Queries zu beantworten, ist äquivalent zu dem Conjunctive Queries (CQs) zu beantworten, welches ein NP-vollständiges Problem ist. Daher ist der Aufwand eines Joins über mehrere Tabellen in einer Datenbank exponentiell in der Anzahl der Tabellen im worst-case.Die Suche nach effizient lösbaren Fällen des CQ Problems war Gegenstand intensiver Forschung über mehrere Jahrzehnte. Eins der größeren Ergebnisse der Forschung in diesem Bereich ist das Resultat, dass azyklische Conjunctive Querier (ACQs) effizient lösbar sind. Angesichts der Tatsache, dass viele reale Queries zyklisch sind, sind ACQs eine starke Einschränkung. Die Hypertree-Width ist eine Verallgemeinerung der Azyklizität, welche die Klasse der azyklischen Queries in eine Hierarchie von fast-azyklischen Queries, welche in polynomieller Zeit lösbar sind, ausweitet. Die Laufzeit der Query-Ausführung ist allerdings von einer guten Hypertree-Zerlegung abhängig. Algorithmen, um diese effizient zu berechnen, sind kürzlich erschienen, was den Lösungsansatz realistisch macht.Obwohl dieser Ansatz zur Query-Ausführung vielversprechend ist, wurde er bis jetzt in kein bestehendes relationales DBMS integriert. Nur ein Prototyp ist bekannt, von dem jedoch die Implementierungsdetails nicht verfügbar sind. In dieser Arbeit wird ein System zur Struktureller-Zerlegung-basierten Query-Optimierung entwickelt. Da eine Integration in den Kern des DBMS komplex wäre und die Wiederverwendung von Zerlegungsalgorithmen schwer machen würde, wurde eine Integration außerhalb des DBMS entwickelt. Solch ein System kann verwendet werden, ohne den DBMS Server zu ersetzen, und könnte mehrere DBMS Implementierungen unterstützen. Um die Laufzeit noch weiter zu verbessern, werden Statistiken aus der Datenbank extrahiert und verwendet, um bessere Zerlegungen zu finden. Zusätzlich wird die Query-Ausführung parallelisiert.Der Vergleich zeigt, dass unser System mit PostgreSQL mithalten kann. Obwohl es auf den meisten Instanzen langsamer oder vergleichbar schnell ist, identifizieren wir Fälle in denen es das DBMS übertrifft. Bei boolean CQs, welche beantworten, ob Ergebnisse existieren oder nicht, stellte sich unser System als sehr effektiv heraus und erzielte wesentlich bessere Ergebnisse als das DBMS alleine. Die Verwendung von Statistiken erwies sich als essenziell für gute Laufzeiten. Des weiteren zeigen wir, dass die dadurch ermöglichte parallele Ausführung von Queries die Performance verbessern kann.

The evaluation of join queries is a central challenge in every database system, which directly affects the query-answering speed and resource utilization. Answering conjunctive queries (CQs), a problem equivalent to that of answering SQL select-from-where statements, is an NP-complete problem. Consequently, joining multiple tables in a database requires an exponential effort with respect to the number of tables, in the worst case.A substantial amount of research has therefore been dedicated to the search for tractable fragments of the CQ evaluation problem. One of the major outcomes of research in this field is the result that acyclic conjunctive queries (ACQs) can be answered efficiently. Considering that many real-world queries are cyclic, ACQs are a strong restriction. The notion of hypertree-width provides a generalization of acyclicity, widening the class of acyclic queries into a hierarchy of nearly-acyclic queries solvable in polynomial time. The runtime of the query execution is, however, dependent on good hypertree decompositions. Algorithms to compute these quickly have emerged recently, making the approach feasible.Although this approach towards query execution is promising, it has not yet been integrated into any existing relational DBMS. Only one research prototype is known where the implementation details are unavailable. In this thesis, a system for structural decomposition-based query optimization is developed and evaluated. Since an integration into the core of the DBMS is complex and makes the reuse of decomposition tools difficult, we have implemented a lightweight integration outside of the DBMS. Such a lightweight system can be integrated by users without replacing the DBMS server and may support different DBMS implementations. For further performance improvements, statistical information is extracted from the database and used for the computation of a good decomposition, and a parallel execution strategy is implemented.The experimental evaluation showed that our system is competitive with PostgreSQL on the CQ answer enumeration problem. Although performing worse or comparably on most instances, we identified cases where it outperforms the DBMS significantly. On the problem of evaluating boolean CQs, queries determining whether a matching row exists or not, the system proved to be very effective and outperformed the DBMS on many instances. We also conclude that the integration of statistics into the computation of the decomposition is essential for the competitiveness of the system. Furthermore, the parallel execution strategy is shown to provide a performance improvement on multiple instances.

Additional information:

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis