Luger, D. (2023). Cost-aware neural network splitting and dynamic rescheduling for edge intelligence [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.108325
With the rise of IoT devices and the necessity of intelligent applications, inference tasks are often ooaded to the cloud due to the computation limitation of the end devices. Yet, requests to the cloud are costly in terms of latency. Therefore, a shift of the computation from the cloud to the network’s edge is unavoidable for time-sensitive applications. This shift is called edge intelligence and promises lower latency, among other advantages. However, some algorithms, like deep neural networks, are computationally intensive, even for local edge servers (ES). Such DNNs can be split into two parts to keep latency low and distributed between the ES and the cloud. We present a dynamic scheduling algorithm that considers real-time parameters like the clock speed of the ES, bandwidth, and latency and predicts the optimal splitting point regarding latency. Furthermore, we estimate the overall costs for the ES and cloud during run-time and integrate them into our prediction and decision models. We present a cost-aware prediction of the splitting point, which can be tuned with a parameter toward faster response or lower costs. We tested our rescheduling algorithm on a test bed with a Raspberry Pi as edge and an AWS instance as a cloud server. The results demonstrate that we achieved a 60.84% decrease in cost compared to the optimal splitting point regarding latency with an increase in latency of only 25.92% for the AlexNet CNN when the edge server is rented.