Reisinger, M. (2022). System support & orchestration mechanisms for distributed DNN inference [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2022.87400
E194 - Institut für Information Systems Engineering
-
Date (published):
2022
-
Number of Pages:
112
-
Keywords:
Artificial Intelligence; Deep Neural Networks; Edge Computing; Fog Computing
en
Abstract:
The use of edge computing as a platform for distributed DNN inference is an active area of research. Recent work proposes new neural network architectures that facilitate the distribution of DNN workloads in such environments. In addition to the classifier on a DNN’s final layer, these architectures introduce side-exit classifiers at intermediate layers. With this approach it is possible to obtain inference results at earlier points in the network and thereby reduce the compute overhead, which is critical for the operation on more constrained devices. This thesis follows a recent line of research, that uses this novel architecture to shift DNN computations towards less powerful devices at the edge of the network, to improve user experience. In contrast to related work, which is more focused on algorithmic aspects to optimize the distributed execution of DNNs, this thesis puts a focus on the design aspects that enable the implementation of an extensible orchestration framework for distributing inference of feed-forward DNN models. Each host in the compute hierarchy operates a runtime environment that offers APIs for orchestration and execution of DNNs, as well as a component for monitoring the node’s resource levels and network conditions. Compute nodes are required to register with a central controller, which maintains a global view on the compute hierarchy. Finally, a scheduler decides about the deployment and orchestration of a given DNN model over the available compute resources. From a software architecture perspective, the scheduler offers a plugin framework, that allows system users to implement and apply their own algorithms for custom placement policies. The system also readily comes with a number of strategies, that aim to minimize end- to-end latency of the DNN inference. We show the optimal placement of layers in the described system landscapes to be an NP-hard combinatorial optimization problem, with respect to minimizing latency. Therefore, we provide an exact algorithm, in the form of an integer linear program, that solves the placement problem to optimality, as well as heuristic approaches for bigger problem instances. Finally, experimental studies evaluate a prototypical system implementation in simulation- based scenarios and on a physical test-bed. On simulated compute hierarchies, the exact placement clearly outperforms the traditional cloud-centric placement. A feasibility study on a physical test-bed confirms that the system is able to identify efficient placements based on monitored environmental conditions.