Tiny deep reinforcement learning for compute constrained agents : solving the inverted pendulum problem on less than 520kB SRAM using skill-oriented autonomous real-world E2E deep reinforcement learning

Tayari, Hakim

doi:10.34726/hss.2026.128166

DC Field

Value

Language

dc.contributor.advisor

Jantsch, Axel

dc.contributor.author

Tayari, Hakim

dc.date.accessioned

2026-04-08T11:15:05Z

dc.date.issued

2026

dc.date.submitted

2026-02

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Tayari, H. (2026). <i>Tiny deep reinforcement learning for compute constrained agents : solving the inverted pendulum problem on less than 520kB SRAM using skill-oriented autonomous real-world E2E deep reinforcement learning</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.128166</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2026.128166

dc.identifier.uri

http://hdl.handle.net/20.500.12708/227472

dc.description.abstract

Deep Reinforcement Learning (DRL) löst das Skalierbarkeitsproblem von Reinforcement Learning durch die Verwendung von Artificial Deep Neural Networks (DNN) als Repräsentation des gelernten Verhaltens. Bis heute ist DRL die stabilste und am meisten erforschte Problemformulierung des maschinellen Lernens für autonomes, durchgehendes und lebenslanges Lernen. Wegen des rechen- und speicherintensiven Designs sind die meisten DRL-Ansätze auf hochperformante Rechenarchitekturen angewiesen (wie z.B. high-end GPUs). Indem wir DRL als Lösungsmethode eines representativen Echtzeitproblems, des invertierten Pendels, verwenden, geben wir in dieser Arbeit eine Perspektive auf DRL in Bezug auf Speichereffizienz, Autonomie während der Lernphase und unvollständige Beobachtbarkeit der Umgebung (partial observability).

dc.description.abstract

Deep Reinforcement Learning (DRL) solves the scalability problem of Reinforcement Learning (RL) by employing a Deep Artificial Neural Network (DNN) as a function approximator to represent the learned policy. To date, DRL provides the most robust and well-established machine learning paradigm for enabling autonomous, continuous, and open-ended learning. Due to its compute- and memory-intensive nature, DRL is limited to high-power platforms such as high-end GPUs. In this work, we present an examination of the DRL paradigm with a perspective on its potential in a low-power environment, for full learning autonomy, and partial observability, at the representative example of solving a real-time control problem, the inverted pendulum.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Deep Reinforcement Learning

dc.subject

Deep Neural Networks

dc.subject

Reinforcement learning

dc.subject

Inverted Pendulum Problem

dc.subject

Resource Efficient Learning

dc.subject

Embedded Systems

dc.subject

Autonomous Learning

dc.title

Tiny deep reinforcement learning for compute constrained agents : solving the inverted pendulum problem on less than 520kB SRAM using skill-oriented autonomous real-world E2E deep reinforcement learning

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2026.128166

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Hakim Tayari

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Kobelrausch, Markus Daniel

tuw.publication.orgunit

E384 - Institut für Computertechnik

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17833882

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0003-2251-0004

item.fulltext

with Fulltext

item.grantfulltext

open

item.cerifentitytype

Publications

item.openairetype

master thesis

item.openaccessfulltext

Open Access

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.mimetype

application/pdf

item.languageiso639-1

crisitem.author.dept

E384-02 - Forschungsbereich Systems on Chip

crisitem.author.parentorg

E384 - Institut für Computertechnik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(4.87 MB)

In Copyright

Show simple item record

Page view(s)

checked on Apr 8, 2026

Download(s)

checked on Apr 8, 2026

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM