Reinforcement-learning-based, application-agnostic, and explainable auto-scaling in the cloud utilizing high-level SLOs

Mayerhofer, Robin

doi:10.34726/hss.2023.106505

DC Element

Wert

Sprache

dc.contributor.advisor

Dustdar, Schahram

dc.contributor.author

Mayerhofer, Robin

dc.date.accessioned

2023-09-11T12:05:27Z

dc.date.issued

2023

dc.date.submitted

2023-09

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Mayerhofer, R. (2023). <i>Reinforcement-learning-based, application-agnostic, and explainable auto-scaling in the cloud utilizing high-level SLOs</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.106505</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2023.106505

dc.identifier.uri

http://hdl.handle.net/20.500.12708/188210

dc.description.abstract

Cloud computing is a widely adopted paradigm in the software industry. The ability to adapt the provisioned resources for an application based on the actual demand is called auto-scaling. Auto-scaling is crucial to keep costs within limits while ensuring sufficient performance. Effective auto-scaling is a multi-dimensional problem and an active area of research. The industry standard for auto-scaling is static thresholds based on low-level metrics such as CPU utilization, while researchers are experimenting with applying Machine Learning techniques to auto-scaling. Static thresholds are hard to set up and need to be manually corrected, and low-level metrics are disconnected from the business goals. On the other hand, Reinforcement Learning is a popular approach to autonomously learning an auto-scaling policy. While promising, RL introduces new problems to the auto-scaling domain, such as a lack of explainability and interpretability, complexity, long learning phases, and bad worst-case performance. We aim to find ways to efficiently auto-scale while bridging the gap between auto-scaling and business goals without the undesirable properties of RL solutions.This thesis presents two approaches to auto-scaling, Extended-Q-Threshold, and HPA-Q-Threshold, both building upon Q-Threshold, an auto-scaling system from the literature. Our auto-scalers are integrated into the Polaris framework, built with a flexible architecture in mind. We extend and adapt Q-Threshold, an approach to auto-scaling where the RL agent controls the usually static threshold, effectively making it dynamic. Our adaptations tackle experimentally identified shortcomings of the Q-Threshold approach. Furthermore, we generalize the approach and apply different scaling metrics and rewards. Thus, we enable the further development and evaluation of this promising approach.We show how a modern, flexible auto-scaler can be integrated with the Polaris framework and run in a Kubernetes cluster. Our experiments evaluate the effectiveness of our proposed adaptations and prove that some are necessary to prevent the identified issues of Q-Threshold. In contrast, others must be carefully assessed and tested, such as different scaling metrics and reward definitions. Overall, the adaptations help ensure the interpretability of the auto-scaler by utilizing a very lightweight implementation of RL. Furthermore, our auto-scaler possesses other positive characteristics, such as limiting the worst-case and guaranteeing acceptable early-stage performance.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Auto scaling

dc.subject

Service Level Objective

dc.subject

Reinforcement Learning

dc.subject

Q-learning

dc.subject

Q-Threshold

dc.subject

Polaris framework

dc.subject

Cloud Elasticity

dc.title

Reinforcement-learning-based, application-agnostic, and explainable auto-scaling in the cloud utilizing high-level SLOs

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2023.106505

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Robin Mayerhofer

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Morichetta, Andrea

tuw.publication.orgunit

E194 - Institut für Information Systems Engineering

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC16940127

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0001-6872-8821

tuw.assistant.orcid

0000-0003-3765-3067

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

E194 - Institut für Information Systems Engineering

crisitem.author.parentorg

E180 - Fakultät für Informatik

Enthalten in den Sammlungen:

Thesis

Volltext (Version of Record (published version))

Adobe PDF

(2.64 MB)

Urheberrechtsschutz

Zur Kurzanzeige

Seiten Aufrufe

249

aufgerufen am 23.11.2023

Download(s)

226

aufgerufen am 23.11.2023

Google Scholar^TM

Check

Seiten Aufrufe

Download(s)

Google ScholarTM

Google Scholar^TM