Ensuring service level objective adherence in the edge-cloud continuum

Pusztai, Thomas Werner

doi:10.34726/hss.2025.127440

Record link:

https://doi.org/10.34726/hss.2025.127440
http://hdl.handle.net/20.500.12708/218810

Title:

Ensuring service level objective adherence in the edge-cloud continuum

Citation:

Pusztai, T. W. (2025). Ensuring service level objective adherence in the edge-cloud continuum [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.127440

reposiTUm DOI:

10.34726/hss.2025.127440

CatalogPlus:

AC17629326

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Pusztai, Thomas Werner

Advisor:

Dustdar, Schahram

Co-advisor:

Nastic, Stefan

Organisational Unit:

E194 - Institut für Information Systems Engineering

Date (published):

2025

Number of Pages:

225

Keywords:

Edge-Cloud Continuum; Service Level Objectives; Elasticity; Scheduling; Serverless Computing

Abstract:

Das Edge-Cloud Continuum ist im letzten Jahrzehnt auf großes Interesse gestoßen. Die Möglichkeit Rechenleistung mit niedriger Latenz dort zu nutzen, wo Daten von Benutzern generiert werden und komplexe Berechnungen in die Cloud auszulagern, ermöglicht eine Vielzahl an Anwendungen, wie z.B. intelligente Fahrzeuge, Augmented Reality oder Zusammenarbeit von Menschen und Robotern in Fabriken. Um die ordnungsemäße Funktion solcher Anwendungen sicherzustellen ist es notwendig entsprechende Qualitätscharakteristika, d.h. Service Level Objectives (SLOs), zu definieren und umzusetzen. In dieser Disseration erforschen wir die Definition und Umsetzung von SLOs im Edge-Cloud Continuum. Zuerst führen wir Abstraktionen ein, welche die Definition und Konfiguration von komplexen, Workload-spezifischen SLOs in typsicherer Weise ermöglichen. Darauf aufbauend präsentieren wir eine Middleware zur Entwicklung von Orchestratorunabhängigen SLO Controllern, die SLOs beobachten und umsetzen. Stark typisierte Metric Querying Abstraktionen ermöglichen das Abfragen und die Aggregierung von Workload Metriken und deren Wiederverwendung als Composed Metrics. Als nächstes behandeln wir SLO-aware Scheduling von langlebigen Microservices im Edge-Cloud Continuum. Wir zeigen einen Scheduler für die SLO-aware Platzierung von asynchronen Anwendungen, die mittels eines Message Brokers kommunizieren. Danach präsentieren wir einen erweiterbaren Scheduler für die SLO-aware Platzierung von synchronen Anwendungen mit komplexen Kommunikationsabhängigkeiten, welche wir mit einem Service Graph modellieren um sie auf die aktuelle Netzwerktopologie zu mappen. Da das Edge-Cloud Continuum mehrere Cluster mit zehntausenden Nodes umfassen kann, zeigen wir einen verteilten Scheduler, der mit der Clustergröße skaliert und Schedulingkonflikte, die bei verteilten Schedulern normalerweise auftreten, niedrig hält. Zur Einhaltung von End-to-End Response Time SLOs in Serverless Workflows präsentieren wir einen Ressourcen Optimizer für Serverless Workflows, der die Größen der Function Inputs und deren Auswirkungen auf die Performance berücksichtigt. Basierend auf Function Performanceprofilen und den aktuellen Inputs wählt der Optimizer Ressourcenprofile aus, welche die Respose Time SLO des Workflows erfüllen und die Kosten minimieren. Schließlich erweitern wir das Edge-Cloud Continuum mit Low Earth Orbit Satelliten zu einem Edge-Cloud-Space 3D Continuum, das Rechenleistung für Anwendungen überall auf der Erde und für Erdbeobachtungssatelliten zur Verfügung stellen kann. Wir präsentieren eine Architektur und einen Scheduler um Serverless Workflows nahtlos im 3D Continuum auszuführen und dabei die End-to-End Response Time SLOs einzuhalten.

The Edge-Cloud Continuum has received enormous interest from academia and industry over the last decade. The possibility to leverage low-latency computing power in proximity to where data are generated by users and to offload complex computations to Cloud datacenters has enabled a large variety of applications, such as smart vehicles, augmented reality, or human-robot collaboration in factories. To ensure proper functioning of these applications it is imperative that proper quality characteristics, i.e., Service Level Objectives (SLOs), are defined and enforced. In this thesis we explore the definition and enforcement of SLOs in the Edge-Cloud Continuum. First, we propose a set of abstractions that enable the definition and configuration of complex workload-specific SLOs in a type-safe manner. Based on these abstractions we present a middleware for the creation of orchestrator-independent SLO controllers that monitor and enforce the aforementioned SLOs. Strongly typed metrics querying abstractions enable the retrieval and aggregation of workload metrics and the reuse of these aggregations in the form of composed metrics. Next, we discuss SLO-aware scheduling of long-lived microservices in the Edge-Cloud Continuum. We present a scheduler that is specifically designed for the SLO-aware placement of asynchronous applications that communicate through a message broker. Subsequently, we present an extensible scheduler that performs SLO-aware placement of synchronous applications with complex communication dependencies, which we model in a service graph abstraction for mapping onto the current network topology. Since the Edge-Cloud Continuum may encompass multiple clusters with tens of thousands of compute nodes, we also introduce a distributed scheduler, which is designed to scale to these cluster sizes, while keeping scheduling conflicts, which typically occur in distributed schedulers, low. To enable the adherence to end-to-end response time SLOs in serverless workflows we present a resource optimizer for serverless workflows, which is aware of function input sizes and their effects on the performance of the functions. Based on function performance profiles and the current inputs, the optimizer assigns resource profiles that meet the end-to-end response time SLO of the workflow, while minimizing costs. Finally, we extend the Edge-Cloud Continuum with low earth orbit satellites to an Edge-Cloud-Space 3D Continuum, which can deliver computational capacity to applications anywhere on Earth and to Earth Observation satellites in space. We present an architecture and a scheduler for seamlessly executing serverless workflows across this 3D Continuum while fulfilling end-to-end response time SLOs.

License:

In Copyright

Appears in Collections:

Thesis