Selzer, A. (2026). A Structural Approach to Query Optimisation for Efficient Join and Aggregate Processing [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.138101
Join processing remains a fundamental challenge in database systems, particularly when intermediate results grow large. This problem is frequently encountered in analytical queries, which aggregate data over many relations, but output only a comparably small final result. Although the vast majority of user-generated queries are acyclic, and the foundational result by Yannakakis provides a solution for avoiding unnecessary computation of intermediate results in theory, structure-guided query evaluation remains absent from standard database systems. In this thesis, we work towards closing this gap from theory to practice.To explore the benefits and limitations of structure-guided query processing, we introduce a query-rewriting-based implementation of Yannakakis' algorithm, on top of different DBMSs. Additionally, we identify the class of zero-materialisation answerable (0MA) queries, which can be answered efficiently without materialising join results. Experimental evaluation shows that the rewriting approach can achieve significant speedups on hard instances but fails to outperform in most situations. These results motivate us to frame the problem as an algorithm selection problem -- we therefore apply machine learning to solve this problem, setting up a new data set of queries in the process, and verify that we can reliably decide when to apply the rewriting.Following up on these encouraging results, we go beyond the quite restrictive 0MA class, introducing guarded and piecewise-guarded aggregate queries, allowing us to cover a wide range of aggregate queries while avoiding materialisation. We integrate these optimisations into the query optimiser of Spark SQL by implementing logical optimisations and a new physical operator AggJoin. Through experimental evaluation, we show that this implementation speeds up aggregate processing on many queries of several benchmarks while never degrading performance. Furthermore, to cover arbitrary unguarded queries, where materialisation cannot be fully avoided, we extend the AggJoin to the GroupAggJoin operator, substantially reducing materialisation on these queries as well. By establishing a new benchmark for unguarded queries, we show that this approach successfully handles moderately unguarded queries.Having established efficient techniques for the processing of large classes of acyclic aggregate queries, we focus on the particularly challenging class of cyclic queries. To apply the standard approach of using decompositions to make cyclic queries acyclic successfully, it is clear that we require secondary optimisation objectives in addition to minimising the width of decompositions. To address this challenge, we introduce soft hypertree decompositions and the associated measure of soft hypertree width (shw). By avoiding the special condition of hypertree decompositions, we gain algorithmic flexibility while retaining the tractability of computing width-k decompositions. By developing an end-to-end pipeline, we achieve a seamless integration of these optimisations into DBMSs. Experimental evaluation on cyclic queries confirms that we can achieve significant improvements in decomposition quality, which translates to performance gains in practice.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft