The advent of ChatGPT, and Generative AI in general, is a watershed moment in the history of technology and is likened to the dawn of the Internet and the smartphone. Generative AI has shown limitless potential in its ability to hold intelligent conversations, pass exams, generate complex programs/code, and create eye-catching images and video.
While GPUs run most Gen AI models in the cloud – both for training and inference – this is not a long-term scalable solution, especially for inference, owing to factors that include cost, power, latency, privacy, and security. This article addresses each of these factors along with motivating examples to move Gen AI compute workloads to the edge.
Most applications run on high-performance processors – either on device (e.g., smartphones, desktops, laptops) or in data centers. As the share of applications that utilize AI expands, these processors with only CPUs are inadequate.
Furthermore, the rapid expansion in Generative AI workloads is driving an exponential demand for AI-enabled servers with expensive, power-hungry GPUs that in turn, is driving up infrastructure costs. These AI-enabled servers can cost upwards of 7X the price of a regular server and GPUs account for 80% of this added cost.
Additionally, a cloud-based server consumes 500W to 2000W, whereas an AI-enabled server consumes between 2000W and 8000W – 4x more! To support these servers, data centers need additional cooling modules and infrastructure upgrades – which can be even higher than the compute investment.
Data centers already consume 300 TWH per year, almost 1% of the total worldwide power consumption. If the trends of AI adoption continue, then as much as 5% of worldwide power could be used by data centers by 2030.
Additionally, there is an unprecedented investment into Generative AI data centers. It is estimated that data centers will consume up to $500 billion for capital expenditures by 2027, mainly fueled by AI infrastructure requirements.