Estimation, profiling and modeling of DNNs for embedded systems

Wess, Matthias

doi:10.34726/hss.2025.43436

DC Field

Value

Language

dc.contributor.advisor

Jantsch, Axel

dc.contributor.author

Wess, Matthias

dc.date.accessioned

2025-08-01T09:23:56Z

dc.date.issued

2025

dc.date.submitted

2025-05

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Wess, M. (2025). <i>Estimation, profiling and modeling of DNNs for embedded systems</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.43436</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2025.43436

dc.identifier.uri

http://hdl.handle.net/20.500.12708/217764

dc.description

Kumulative Dissertation aus vier Artikeln

dc.description.abstract

This thesis has made contributions toward building a structured design and implementation flow for DNN, addressing key challenges in estimation, quantization, and profiling. Through the development of new methodologies and frameworks such as ANNETTE, this work has introduced solutions that enhance the efficiency of DNN deployment on a range of hardware platforms, also highlighting the areas that require further development to achieve the overall goal of a generalizable design and implementation flow.In the area of estimation, this thesis successfully implemented latency estimation [Paper I] [1] to create a more holistic understanding of DNN behavior on embedded hardware. The combination of analytical and stochastic models proved to be crucial in this field. The introduction of a confidence framework [Paper II] [2] further improves the usability of the prediction results, allowing for more informed performance tuning across a wide range of architectures.The next logical step is to connect latency estimation with resource utilization and power consumption models, bringing them together into a unified framework. Such a model would not only offer more comprehensive performance predictions but also make use of benchmarking across diverse hardware settings. This could also enable accurate predictions for upcoming hardware generations, leveraging architectural insights. Moreover, refining the confidence methods to include optimizations that occur during graph compilation, will further increase the trustworthiness of the predictions.An additional focus will need to be on an automated analysis and estimation whether a network can be inferred on a target device. This way developers can ensure that the selected or designed network contains only supported layers and fits within the memory constraints of the target hardware.Quantization plays a crucial role in optimizing DNNs for deployment on resource-constrained devices. The development of WQR provides a method for dynamically adjusting layer precision based on its criticality. However, quantization remains a challenging and evolving area. There is no single method that is perfect for all networks and hardware platforms, and given the rapid pace at which DNN architectures evolve, this is unlikely to change. The field is moving toward smaller integer datatypes, such as INT4 and INT2 [78], but striking the right balance between compression and accuracy will continue to be a key focus.As the landscape of DNN architectures shifts, future research should aim to improve the flexibility of quantization techniques, ensuring that they can adapt to the demands of new architectures. Although post-training quantization will likely remain the standard due to its simplicity, methods for QAT will be essential in cases where precision must be maintained despite aggressive quantization. Standards and support for different quantization types will also need to evolve, providing developers with more tools to handle the increasing complexity of DNNs.Profiling is another area where this thesis has made progress. The use of power side-channel analysis provided valuable insights into the power and latency performance of DNNs at a granular level [Paper IV] [4]. This allows for a detailed understanding of how individual operations impact overall performance, particularly when deployed on embedded platforms. However, despite these advancements, the complexity of the post-processing required for power side-channel analysis prevented the full automation of these measurements. This presents a challenge for scaling the methodology to broader use cases. Additionally, the introduction of the smart padding technique enabled accurate layer isolation for latency profiling, even in the absence of detailed hardware transparency.Future efforts should aim to combine both approaches to enable power consumption and resource estimation. The smart padding could potentially increase the reliability of the power measurements. Automation will also be key tomaking profiling more scalable, enabling real-time assessments across different hardware platforms, and streamlining the benchmarking process. As hardware architectures continue to diversify, automated profiling tools will be crucial for ensuring that performance predictions remain accurate and consistent across platforms.As the focus of AI development increasingly shifts toward Large Language Models (LLMs) like GPT and BERT,the methodologies developed in this thesis take on new relevance. While this work primarily Convolutional Neural Networks (CNNs), the growing prominence of LLMs presents a set of unique challenges that must be addressed in future research. LLMs are typically much larger than traditional CNNs for vision applications, and their resource requirements,particularly in terms of memory and power consumption, are substantially higher. The resource and power estimation techniques developed here provide a strong foundation, but they will need to be adapted to the specific needs of LLMs.The primary difficulty lies in the fact that many LLM implementations are still highly customized, relying on hand-optimized code to achieve maximum performance. Unlike CNNs, where inference frameworks have become widely adopted, LLMs lack such standardization, making their optimization more complex.Looking ahead, it will be essential to develop more sophisticated resource and power estimation models that fit specifically to LLMs. Hardware support for LLMs is also still evolving, and future research should explore how to optimizethese models for next-generation hardware platforms, ensuring that both efficiency and performance are maximized.Standards for LLM optimization will need to mature, and tools that can automate the benchmarking and profiling of these large-scale models will be crucial for their successful deployment.As AI models and hardware continue to evolve, the need for adaptable, flexible, and efficient design methodologies will only grow. The contributions made in this thesis provide a strong foundation for this evolution, offering valuable insights and tools that will help drive further advancements in DNN optimization. By continuing to refine these methods and adapting them to new technologies, the field will move closer to realizing a fully automated, generalizable DNN design and implementation flow.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Embedded machine learning

dc.subject

deep neural networks

dc.subject

performance estimation

dc.title

Estimation, profiling and modeling of DNNs for embedded systems

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2025.43436

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Matthias Wess

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Pudukotai Dinakarrao, Sai Manoj

tuw.publication.orgunit

E384 - Institut für Computertechnik

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC17599213

dc.description.numberOfPages

dc.thesistype

Dissertation

dc.thesistype

Dissertation

tuw.author.orcid

0000-0002-1877-4114

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0003-2251-0004

item.languageiso639-1

item.grantfulltext

open

item.openairetype

doctoral thesis

item.openaccessfulltext

Open Access

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.cerifentitytype

Publications

item.fulltext

with Fulltext

crisitem.author.dept

E384-02 - Forschungsbereich Systems on Chip

crisitem.author.orcid

0000-0002-1877-4114

crisitem.author.parentorg

E384 - Institut für Computertechnik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(4.03 MB)

In Copyright

Show simple item record

Page view(s)

checked on Aug 2, 2025

Download(s)

checked on Aug 2, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM