Mayerhofer, R. (2023). Reinforcement-learning-based, application-agnostic, and explainable auto-scaling in the cloud utilizing high-level SLOs [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.106505
E194 - Institut für Information Systems Engineering
-
Date (published):
2023
-
Number of Pages:
92
-
Keywords:
Auto scaling; Service Level Objective; Reinforcement Learning; Q-learning; Q-Threshold; Polaris framework; Cloud Elasticity
en
Abstract:
Cloud computing is a widely adopted paradigm in the software industry. The ability to adapt the provisioned resources for an application based on the actual demand is called auto-scaling. Auto-scaling is crucial to keep costs within limits while ensuring sufficient performance. Effective auto-scaling is a multi-dimensional problem and an active area of research. The industry standard for auto-scaling is static thresholds based on low-level metrics such as CPU utilization, while researchers are experimenting with applying Machine Learning techniques to auto-scaling. Static thresholds are hard to set up and need to be manually corrected, and low-level metrics are disconnected from the business goals. On the other hand, Reinforcement Learning is a popular approach to autonomously learning an auto-scaling policy. While promising, RL introduces new problems to the auto-scaling domain, such as a lack of explainability and interpretability, complexity, long learning phases, and bad worst-case performance. We aim to find ways to efficiently auto-scale while bridging the gap between auto-scaling and business goals without the undesirable properties of RL solutions.This thesis presents two approaches to auto-scaling, Extended-Q-Threshold, and HPA-Q-Threshold, both building upon Q-Threshold, an auto-scaling system from the literature. Our auto-scalers are integrated into the Polaris framework, built with a flexible architecture in mind. We extend and adapt Q-Threshold, an approach to auto-scaling where the RL agent controls the usually static threshold, effectively making it dynamic. Our adaptations tackle experimentally identified shortcomings of the Q-Threshold approach. Furthermore, we generalize the approach and apply different scaling metrics and rewards. Thus, we enable the further development and evaluation of this promising approach.We show how a modern, flexible auto-scaler can be integrated with the Polaris framework and run in a Kubernetes cluster. Our experiments evaluate the effectiveness of our proposed adaptations and prove that some are necessary to prevent the identified issues of Q-Threshold. In contrast, others must be carefully assessed and tested, such as different scaling metrics and reward definitions. Overall, the adaptations help ensure the interpretability of the auto-scaler by utilizing a very lightweight implementation of RL. Furthermore, our auto-scaler possesses other positive characteristics, such as limiting the worst-case and guaranteeing acceptable early-stage performance.