System support & orchestration mechanisms for distributed DNN inference

Reisinger, Matthias

doi:10.34726/hss.2022.87400

Record link:

https://doi.org/10.34726/hss.2022.87400
http://hdl.handle.net/20.500.12708/20215

Title:

System support & orchestration mechanisms for distributed DNN inference

Citation:

Reisinger, M. (2022). System support & orchestration mechanisms for distributed DNN inference [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.87400

reposiTUm DOI:

10.34726/hss.2022.87400

CatalogPlus:

AC16531189

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Reisinger, Matthias

Advisor:

Dustdar, Schahram

Co-advisor:

Frangoudis, Pantelis

Organisational Unit:

E194 - Institut für Information Systems Engineering

Date (published):

2022

Number of Pages:

112

Keywords:

Artificial Intelligence; Deep Neural Networks; Edge Computing; Fog Computing

Abstract:

The use of edge computing as a platform for distributed DNN inference is an active area of research. Recent work proposes new neural network architectures that facilitate the distribution of DNN workloads in such environments. In addition to the classifier on a DNN’s final layer, these architectures introduce side-exit classifiers at intermediate layers. With this approach it is possible to obtain inference results at earlier points in the network and thereby reduce the compute overhead, which is critical for the operation on more constrained devices. This thesis follows a recent line of research, that uses this novel architecture to shift DNN computations towards less powerful devices at the edge of the network, to improve user experience. In contrast to related work, which is more focused on algorithmic aspects to optimize the distributed execution of DNNs, this thesis puts a focus on the design aspects that enable the implementation of an extensible orchestration framework for distributing inference of feed-forward DNN models. Each host in the compute hierarchy operates a runtime environment that offers APIs for orchestration and execution of DNNs, as well as a component for monitoring the node’s resource levels and network conditions. Compute nodes are required to register with a central controller, which maintains a global view on the compute hierarchy. Finally, a scheduler decides about the deployment and orchestration of a given DNN model over the available compute resources. From a software architecture perspective, the scheduler offers a plugin framework, that allows system users to implement and apply their own algorithms for custom placement policies. The system also readily comes with a number of strategies, that aim to minimize end- to-end latency of the DNN inference. We show the optimal placement of layers in the described system landscapes to be an NP-hard combinatorial optimization problem, with respect to minimizing latency. Therefore, we provide an exact algorithm, in the form of an integer linear program, that solves the placement problem to optimality, as well as heuristic approaches for bigger problem instances. Finally, experimental studies evaluate a prototypical system implementation in simulation- based scenarios and on a physical test-bed. On simulated compute hierarchies, the exact placement clearly outperforms the traditional cloud-centric placement. A feasibility study on a physical test-bed confirms that the system is able to identify efficient placements based on monitored environmental conditions.

License:

In Copyright

Appears in Collections:

Thesis