All-You-Can-Inference: Serverless DNN Model Inference Suite

Presentation: PDF

Serverless computing becomes prevalent and is widely adopted for various applications. Deep learning inference tasks are appropriate to be deployed using a serverless architecture. When serving a Deep Neural Net (DNN) model in a serverless computing environment, there exist many performance optimization opportunities, including various hardware support, model graph optimization, hardware-agnostic model compilation, memory size and batch size configurations, and many others. Although the serverless computing frees users from cloud resource management overhead, it is still very challenging to find an optimal serverless DNN inference environment among a very large optimization opportunities for the configurations. In this work, we propose All-You-Can-Inference (AYCI), which helps users to find an optimally operating DNN inference in a publicly available serverless computing environment. We have built the proposed system as a service using various fully-managed cloud services and open-sourced the system to help DNN application developers to build an optimal serving environment. The prototype implementation and initial experiment reveal promising results with respect to identifying an optimal serving environment and ensuring the practicality of the proposed system.