A Structural Approach to Query Optimisation for Efficient Join and Aggregate Processing

Selzer, Alexander

doi:10.34726/hss.2026.138101

DC Field

Value

Language

dc.contributor.advisor

Pichler, Reinhard

dc.contributor.author

Selzer, Alexander

dc.date.accessioned

2026-01-26T14:50:54Z

dc.date.issued

2026

dc.date.submitted

2026-01

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Selzer, A. (2026). <i>A Structural Approach to Query Optimisation for Efficient Join and Aggregate Processing</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.138101</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2026.138101

dc.identifier.uri

http://hdl.handle.net/20.500.12708/225588

dc.description

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft

dc.description.abstract

Join processing remains a fundamental challenge in database systems, particularly when intermediate results grow large. This problem is frequently encountered in analytical queries, which aggregate data over many relations, but output only a comparably small final result. Although the vast majority of user-generated queries are acyclic, and the foundational result by Yannakakis provides a solution for avoiding unnecessary computation of intermediate results in theory, structure-guided query evaluation remains absent from standard database systems. In this thesis, we work towards closing this gap from theory to practice.To explore the benefits and limitations of structure-guided query processing, we introduce a query-rewriting-based implementation of Yannakakis' algorithm, on top of different DBMSs. Additionally, we identify the class of zero-materialisation answerable (0MA) queries, which can be answered efficiently without materialising join results. Experimental evaluation shows that the rewriting approach can achieve significant speedups on hard instances but fails to outperform in most situations. These results motivate us to frame the problem as an algorithm selection problem -- we therefore apply machine learning to solve this problem, setting up a new data set of queries in the process, and verify that we can reliably decide when to apply the rewriting.Following up on these encouraging results, we go beyond the quite restrictive 0MA class, introducing guarded and piecewise-guarded aggregate queries, allowing us to cover a wide range of aggregate queries while avoiding materialisation. We integrate these optimisations into the query optimiser of Spark SQL by implementing logical optimisations and a new physical operator AggJoin. Through experimental evaluation, we show that this implementation speeds up aggregate processing on many queries of several benchmarks while never degrading performance. Furthermore, to cover arbitrary unguarded queries, where materialisation cannot be fully avoided, we extend the AggJoin to the GroupAggJoin operator, substantially reducing materialisation on these queries as well. By establishing a new benchmark for unguarded queries, we show that this approach successfully handles moderately unguarded queries.Having established efficient techniques for the processing of large classes of acyclic aggregate queries, we focus on the particularly challenging class of cyclic queries. To apply the standard approach of using decompositions to make cyclic queries acyclic successfully, it is clear that we require secondary optimisation objectives in addition to minimising the width of decompositions. To address this challenge, we introduce soft hypertree decompositions and the associated measure of soft hypertree width (shw). By avoiding the special condition of hypertree decompositions, we gain algorithmic flexibility while retaining the tractability of computing width-k decompositions. By developing an end-to-end pipeline, we achieve a seamless integration of these optimisations into DBMSs. Experimental evaluation on cyclic queries confirms that we can achieve significant improvements in decomposition quality, which translates to performance gains in practice.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

relational databases

dc.subject

join processing

dc.subject

aggregate queries

dc.subject

acyclic queries

dc.subject

cyclic queries

dc.subject

structural decompositions

dc.subject

relational databases

dc.subject

join processing

dc.subject

aggregate queries

dc.subject

acyclic queries

dc.subject

cyclic queries

dc.subject

structural decompositions

dc.title

A Structural Approach to Query Optimisation for Efficient Join and Aggregate Processing

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2026.138101

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Alexander Selzer

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Lanzinger, Matthias Paul

tuw.publication.orgunit

E192 - Institut für Logic and Computation

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC17758013

dc.description.numberOfPages

146

dc.thesistype

Dissertation

dc.thesistype

Dissertation

tuw.author.orcid

0000-0002-6867-5448

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0002-1760-122X

tuw.assistant.orcid

0000-0002-7601-3727

item.openaccessfulltext

Open Access

item.mimetype

application/pdf

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.languageiso639-1

item.openairetype

doctoral thesis

crisitem.author.dept

E192-02 - Forschungsbereich Databases and Artificial Intelligence

crisitem.author.orcid

0000-0002-6867-5448

crisitem.author.parentorg

E192 - Institut für Logic and Computation

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(2.09 MB)

In Copyright

Show simple item record

Page view(s)

checked on Jan 26, 2026

Download(s)

checked on Jan 26, 2026

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM