https://cis.temple.edu/~jiewu/research/publications/Publication_files/ICPP_2021_Duan.pdf
ABSTRACT Reducing the inference time of Deep Neural Networks (DNNs) is critical when running time sensitive applications on mobile devices. Existing research has shown that partitioning a DNN and ofloading a part of its computation to cloud servers can reduce the inference time. The single DNN partition problem has been extensively in-vestigated recently. However, in real-world applications, a ...