FrankenSplit: efficient neural feature compression with shallow variational bottleneck injection for mobile edge computing

Furutanpey, Alireza; Raith, Philipp; Dustdar, Schahram

doi:10.1109/TMC.2024.3381952

Record link:

http://hdl.handle.net/20.500.12708/209722

Title:

FrankenSplit: efficient neural feature compression with shallow variational bottleneck injection for mobile edge computing

Citation:

Furutanpey, A., Raith, P., & Dustdar, S. (2024). FrankenSplit: efficient neural feature compression with shallow variational bottleneck injection for mobile edge computing. IEEE Transactions on Mobile Computing, 23(12), 10770–10786. https://doi.org/10.1109/TMC.2024.3381952

Publisher DOI:

10.1109/TMC.2024.3381952

CatalogPlus:

AC17419617

Publication Type:

Article - Original Research Article

Language:

English

Authors:

Furutanpey, Alireza
Raith, Philipp
Dustdar, Schahram

Organisational Unit:

E194-02 - Forschungsbereich Distributed Systems

Journal:

IEEE Transactions on Mobile Computing

ISSN:

1536-1233

Date (published):

Dec-2024

Number of Pages:

Publisher:

IEEE COMPUTER SOC

Peer reviewed:

Yes

Keywords:

data compression; distributed inference; edge computing; edge intelligence; feature compression; knowledge distillation; learned image compression; neural data compression; Split computing

Abstract:

The rise of mobile AI accelerators allows latency-sensitive applications to execute lightweight Deep Neural Networks (DNNs) on the client side. However, critical applications require powerful models that edge devices cannot host and must therefore offload requests, where the high-dimensional data will compete for limited bandwidth. Split Computing (SC) alleviates resource inefficiency by partitioning DNN layers across devices, but current methods are overly specific and only marginally reduce bandwidth consumption. This work proposes shifting away from focusing on executing shallow layers of partitioned DNNs. Instead, it advocates concentrating the local resources on variational compression optimized for machine interpretability. We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an environment reflecting the asymmetric resource distribution between edge devices and servers. Our method achieves 60% lower bitrate than a state-of-the-art SC method without decreasing accuracy and is up to 16x faster than offloading with existing codec standards.

Link (external):

https://github.com/rezafuru/FrankenSplit

Research Areas:

Information Systems Engineering: 100%

Science Branch:

1020 - Informatik: 100%

License:

CC BY 4.0