Illuminating the Hidden Challenges of Serverless LLM Systems

The convergence of large language models (LLMs) and serverless computing presents a unique opportunity to make cutting-edge artificial intelligence (AI) capabilities more widely available. However, this integration provides several challenges stemming from the architectural mismatch between stateless, ephemeral serverless functions and the resource-intensive, stateful nature of modern LLMs. This paper introduces a comprehensive vision for efficient serverless LLM systems. We analyze the unique requirements of serverless LLM systems and identify gaps in existing approaches. Followed by, we propose a three-layer architecture comprising a declarative interface, an adaptive orchestration engine, and an efficient serverless runtime. This paper highlights the unique challenges of cold start latency, state management, resource provisioning, and performance isolation that must be dealt with to understand the full potential of serverless LLMs. This work provides a foundational roadmap for researchers to make LLMs more accessible, cost-effective, and scalable through serverless paradigms.