Optimizing Distributed LLM Inference for Heterogeneous Workers through Dynamic Graph Partitioning

Kitzberger, Gabriel

doi:10.34726/hss.2026.138984

DC Field

Value

Language

dc.contributor.advisor

Dustdar, Schahram

dc.contributor.author

Kitzberger, Gabriel

dc.date.accessioned

2026-03-26T09:33:49Z

dc.date.issued

2026

dc.date.submitted

2026-03

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Kitzberger, G. (2026). <i>Optimizing Distributed LLM Inference for Heterogeneous Workers through Dynamic Graph Partitioning</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.138984</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2026.138984

dc.identifier.uri

http://hdl.handle.net/20.500.12708/227225

dc.description

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft

dc.description

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

dc.description.abstract

High parameter counts of frontier large language models (LLMs) restrict inference to well-resourced institutions with server-grade hardware. Distributed inference using heterogeneous consumer devices offers a path forward; however, existing systems require manual configuration and setup, limiting accessibility to expert users.We propose a dynamic worker-aware graph partitioning algorithm for ONNX-based LLMs that reduces model partitioning to a variant of the Ordered Partition Problem, solvable in O(n^2m) for n layers and m workers. The algorithm jointly optimizes worker memory, execution speed, network conditions, and cached model weights to minimize end-to-end inference latency. We integrate this algorithm into a distributed inference server implemented in over 5,500 lines of Rust. The server dynamically repartitions the model at runtime in response to workers joining or leaving the system. Using the browser as a distribution mechanism enables zero-setup participation, eliminating the need for manual worker configuration. Empirical evaluation shows that our cost model achieves a mean absolute percentage error (MAPE) of 8.4% overall and 4.4% for large models. Dynamic partitioning consistently outperforms static equal-layer splitting, and an ablation study confirms that each worker metric contributes meaningfully to assignment quality. We demonstrate the system recovering from unexpected worker disconnects and distributing a model totaling 60 GB of weights across heterogeneous devices.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Distributed LLM Inference

dc.subject

LLM Partitioning

dc.subject

Dynamic Graph Partitioning

dc.subject

Dynamic Programming

dc.subject

Web-Based LLM Inference

dc.subject

ONNX

dc.title

Optimizing Distributed LLM Inference for Heterogeneous Workers through Dynamic Graph Partitioning

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2026.138984

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Gabriel Kitzberger

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Furutanpey, Alireza

tuw.publication.orgunit

E194 - Institut für Information Systems Engineering

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17823787

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0001-6872-8821

tuw.assistant.orcid

0000-0001-5621-7899

item.mimetype

application/pdf

item.cerifentitytype

Publications

item.fulltext

with Fulltext

item.openaccessfulltext

Open Access

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.grantfulltext

open

item.openairetype

master thesis

item.languageiso639-1

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(1.32 MB)

In Copyright

Show simple item record

Page view(s)

checked on Mar 26, 2026

Download(s)

checked on Mar 26, 2026

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM