Tiny deep reinforcement learning for compute constrained agents : solving the inverted pendulum problem on less than 520kB SRAM using skill-oriented autonomous real-world E2E deep reinforcement learning

Tayari, Hakim

doi:10.34726/hss.2026.128166

Record link:

https://doi.org/10.34726/hss.2026.128166
http://hdl.handle.net/20.500.12708/227472

Title:

Tiny deep reinforcement learning for compute constrained agents : solving the inverted pendulum problem on less than 520kB SRAM using skill-oriented autonomous real-world E2E deep reinforcement learning

Citation:

Tayari, H. (2026). Tiny deep reinforcement learning for compute constrained agents : solving the inverted pendulum problem on less than 520kB SRAM using skill-oriented autonomous real-world E2E deep reinforcement learning [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.128166

reposiTUm DOI:

10.34726/hss.2026.128166

CatalogPlus:

AC17833882

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Tayari, Hakim

Advisor:

Jantsch, Axel

Co-advisor:

Kobelrausch, Markus Daniel

Organisational Unit:

E384 - Institut für Computertechnik

Date (published):

2026

Number of Pages:

Keywords:

Deep Reinforcement Learning; Deep Neural Networks; Reinforcement learning; Inverted Pendulum Problem; Resource Efficient Learning; Embedded Systems; Autonomous Learning

Abstract:

Deep Reinforcement Learning (DRL) löst das Skalierbarkeitsproblem von Reinforcement Learning durch die Verwendung von Artificial Deep Neural Networks (DNN) als Repräsentation des gelernten Verhaltens. Bis heute ist DRL die stabilste und am meisten erforschte Problemformulierung des maschinellen Lernens für autonomes, durchgehendes und lebenslanges Lernen. Wegen des rechen- und speicherintensiven Designs sind die meisten DRL-Ansätze auf hochperformante Rechenarchitekturen angewiesen (wie z.B. high-end GPUs). Indem wir DRL als Lösungsmethode eines representativen Echtzeitproblems, des invertierten Pendels, verwenden, geben wir in dieser Arbeit eine Perspektive auf DRL in Bezug auf Speichereffizienz, Autonomie während der Lernphase und unvollständige Beobachtbarkeit der Umgebung (partial observability).

Deep Reinforcement Learning (DRL) solves the scalability problem of Reinforcement Learning (RL) by employing a Deep Artificial Neural Network (DNN) as a function approximator to represent the learned policy. To date, DRL provides the most robust and well-established machine learning paradigm for enabling autonomous, continuous, and open-ended learning. Due to its compute- and memory-intensive nature, DRL is limited to high-power platforms such as high-end GPUs. In this work, we present an examination of the DRL paradigm with a perspective on its potential in a low-power environment, for full learning autonomy, and partial observability, at the representative example of solving a real-time control problem, the inverted pendulum.

License:

In Copyright

Appears in Collections:

Thesis