Aichinger, J. (2024). Structure-Guided Query Optimization in Column-Stores [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.113980
In recent years, the rise of data-driven fields such as data science, artificial intelligence, and business intelligence has significantly increased the demand for efficient data storage solutions. As a result, database management systems (DBMS) have become crucial, with column-based systems gaining popularity for their exceptional performance in large-scale, read-heavy analytical workloads. A fundamental operation in these systems is the join, which combines data from multiple relations. However, efficiently processing join queries, especially those involving numerous relations, remains challenging due to the generationof excessive, and in many cases unnecessary, intermediate results. These intermediate results are frequently much larger than the final output, leading to significant memory usage and reduced performance, particularly in the case of aggregate queries. While column-stores typically excel in executing aggregate queries, the explosion of intermediate results during query processing can severely undermine their efficiency.Interestingly, recent research discovered a novel optimization technique for exactly this problem. By applying a partial execution of the so-called Yannakakis’ algorithm, it is possible under certain conditions to avoid producing these unnecessary intermediate results and thereby improve the performance of these queries. This approach is different from traditional query optimization techniques, as no cardinality estimates are used, but instead, the optimizer uses certain structural properties of the query.Despite its potential, this optimization technique has yet to be integrated into anycolumn-based database system. The implementation is particularly challenging due to the impedance mismatch with the Volcano Query Evaluation Model, which is commonly used by many DBMS. This thesis aims to fill that gap by integrating this optimization technique into ClickHouse, which can be considered the most popular column-store at the moment according to the rankings from DB-Engines. The results are highly promising and show that queries that would typically timeout can now be executed efficiently without issues, thanks to this optimization.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers