A big Data analytics framework for evaluating automated elastic scalability of the SMACK-stack

Wedenik, Benedikt Peter

doi:10.34726/hss.2018.46400

DC Field

Value

Language

dc.contributor.advisor

Dustdar, Schahram

dc.contributor.author

Wedenik, Benedikt Peter

dc.date.accessioned

2020-06-29T15:02:13Z

dc.date.issued

2018

dc.date.submitted

2018-10

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Wedenik, B. P. (2018). <i>A big Data analytics framework for evaluating automated elastic scalability of the SMACK-stack</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2018.46400</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2018.46400

dc.identifier.uri

http://hdl.handle.net/20.500.12708/7041

dc.description.abstract

In den letzten Jahren ist der Bedarf an schneller Verfügbarkeit von Informationen, sowie an kurzen Antwortzeiten gestiegen. Die Anforderungen an ein heutiges Businesskonzept sind im Wandel: Stundenoder gar tagelanges Warten auf die Ergebnisse einer Abfrage ist in vielen Branchen schlichtweg nicht mehr akzeptabel. Die Antwort kommt sofort oder die Anfrage wird verworfen - genau hier setzt der Begriff "Fast Dataëin. Mit dem SMACK Stack, bestehend aus Spark, Mesos, Akka, Cassandra und Kafka, wird eine robuste und vielseitige Datenverarbeitungsplattform bereitgestellt, auf der Fast Data Applikationen ausgeführt werden können. In dieser Thesis wird ein Framework vorgestellt, mit dessen Hilfe Services und Ressourcen innerhalb des Stacks einfach skaliert werden können. Die Hauptbeiträge können wie folgt zusammengefasst werden: 1) Entwicklung und Evaluation des genannten Frameworks, einschließlich der Monitoring-Metrik Extraktion & Aggregation, sowie des Skalierungsservices selbst. 2) Implementierung zweier realworld Referenzapplikationen. 3) Bereitstellung von Infrastruktur-Management Tools mit deren Hilfe der Stack einfach in der Cloud deployt werden kann. 4) Bereitstellung von Deployment-Vorlagen in Form von Empfehlungen, wie der Stack initial am besten für die vorhandenen Ressourcen konfiguriert und gestartet wird. Für die Evaluierung des Frameworks werden die zwei entwickelten real-world Applikationen herangezogen. Die erste Applikation basiert auf der Verarbeitung von IoT Daten und ist stark I/O-lastig, während die zweite Applikation kleinere Datenmengen verarbeitet, dafür aber teurere Berechnungen durchführt, um Vorhersagen aufgrund der IoT Daten zu treffen. Die Resultate zeigen, dass das Framework in der Lage ist zu erkennen, welcher Teil des Systems gerade unter hoher Last steht und diesen dann automatisch zu skalieren. Bei der IoT Applikation konnte der Datendurchsatz um bis zu 73% erhöht werden, während die Vorhersageapplikation in der Lage war bis zu 169% mehr Nachrichten zu bearbeiten, wenn das Framework aktiviert wurde. Obwohl die Resultate vielversprechend aussehen, gibt es noch Potenzial für weitere Verbesserungen, wie zum Beispiel der Einsatz von maschinellem Lernen um Schwellwerte intelligent anzupassen, oder eine breitere und erweiterte REST API.

dc.description.abstract

In the last years the demand of information availability and shorter response times is increasing. Todays business requirements are changing: Waiting hours or even days for the result of a query is not acceptable anymore in many sectors. The response needs to be immediate, or the query is discarded - This is where "Fast Data" begins. With the SMACK Stack, consisting of Spark, Mesos, Akka, Cassandra and Kafka, a robust and versatile platform and toolset to successfully run Fast Data applications is provided. In this thesis a framework to correctly scale services and distribute resources within the stack is introduced. The main contributions of this thesis are: 1) Development and evaluation of the mentioned framework, including monitoring metrics extraction and aggregation, as well as the scaling service itself. 2) Implementation of two real-world reference applications. 3) Providing infrastructure management tools to easily deploy the stack in the cloud. 4) Deployment blueprints in form of recommendations on how to initially set up and configure available resources are provided. To evaluate the framework, the real world applications are used for benchmarking. One application is based on IoT data and is mainly I/O demanding, while the other one is computationally bound and provides predictions based on IoT data. The results indicate, that the framework performs well in terms of identifying which component is under heavy stress and scaling it automatically. This leads to an increase of throughput in the IoT application of up to 73%, while the prediction application is able to handle up to 169% more messages when using the supervising framework. While the results look promising, there is still potential for future work, like using machine learning to better handle thresholds or an extended REST API.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Big Data

dc.subject

Cloud Computing

dc.subject

SMACK-Stack

dc.subject

Scalability

dc.subject

Data Analytics

dc.subject

Big Data

dc.subject

Cloud Computing

dc.subject

SMACK-Stack

dc.subject

Scalability

dc.subject

Data Analytics

dc.title

A big Data analytics framework for evaluating automated elastic scalability of the SMACK-stack

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2018.46400

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Benedikt Peter Wedenik

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Nastic, Stefan

tuw.publication.orgunit

E194 - Institut für Information Systems Engineering

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC15187070

dc.description.numberOfPages

104

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-116384

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0001-6872-8821

tuw.assistant.orcid

0000-0003-0410-6315

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

TU Wien

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(7.11 MB)

In Copyright

Show simple item record

Page view(s)

422

checked on Nov 22, 2023

Download(s)

174

checked on Nov 22, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM