Ledl, M. (2021). Integration of worst-case optimal join algorithms into column-stores [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.89722
Business intelligence and analytical systems have to handle an increasing amount of data and require to efficiently answer queries. Therefore, the join evaluation has become a major challenge to be tackled by any database system to efficiently evaluate queries. To that end, plenty of research has been conducted to introduce concepts to compute join queries in a resource and runtime efficient way. The research area of column-oriented database systems has introduced a new database architecture from scratch. Such column-stores vertically partition a database and store columns independently of each other. Another research area that aims for optimizing query performance deals with worst-case optimal joins (WCOJ). Theoretical concepts and algorithms that define tight bounds on join algorithms’ runtime have been introduced in this field. Both areas successfully came up with join optimization concepts independently of each other. Since a combination of concepts from both fields yield great potential for further performance optimization, this thesis aims for integrating WCOJs into column-stores and is the first work to perform and evaluate such combination. This thesis introduces a way of integrating WCOJ algorithms into column-stores. The integration process discusses the query compiler of a specific column-store and the manual translation of a class of WCOJ algorithms into its internal language. Finally, the translation result is integrated into the column-store’s query compiler and the resulting system is evaluated against the original one. The evaluation showed that our system outperforms the original one on a given set of natural join queries in settings with skewed data and underlines the practical potential of WCOJ algorithms for fighting skew in input data. This thesis further showed that the column-store with WCOJ algorithm integrated can evaluate natural join queries with big input relation sizes which cause memory allocation errors for the original systems.