Budget flexibility for on-prem AI
Consider an organisation that, understandably, wants to ensure its customer data and intellectual property is kept as safe as possible, assessing how to “do AI”. It is easy to assume everyone is doing AI, but are they actually doing anything useful? And if they are, how are they managing to prevent business-sensitive data leakage and avoid their intellectual property being swallowed up by a large language model, to improve its understanding of what makes the business unique?
IT leaders in such organisations may well be tempted to try deploying something on-premise, in a private cloud, or securely in the public cloud, where they can effectively rent graphics processing units (GPUs) for running AI inference workloads on an hourly basis.
The purchase price of servers capable of running DeepSeek-R1’s 670 billion parameter model on-premise, can easily cost hundreds of thousands of pounds, assuming IT buyers can get the much sought-after GPUs the server needs. Deploying it on the public cloud will be cheaper, if the IT department is prepared to pay for reserved GPU instances and commit to a three year contract.
Things are moving too quick to lock-in a deal
But which head of finance would be prepared to sign off on such a commitment especially given how quickly AI technology is evolving? Consider the conversation with the boss after having signed off on a GPU-powered AI server or a three year public cloud or hosting agreement, only to discover that a brand new model can achieve spectacular results for far less of a financial outlay.
The fact that DeepSeek runs on Huawei Ascend 910C chips for inference rather than the latest and greatest GPUs from Nvidia, shows that there is more than one approach to large language models.
The transcript of Meta’s latest financial results for the fourth quarter, 2024, reveals that the owner of Facebook, Instagram and WhatApp, is planning to replace older GPU servers that have reached the end of their useful life with a custom chip, which has been optimised to run its AI workloads cost-effectively. Given CEO Mark Zuckerberg’s ambitions to make Meta AI available to a billion people, doing these things as cheaply as possible is clearly a top priority.
And it is not alone. Saudi Telecom Company has just penned a deal with Palo Alto start-up, SambaNova, to deliver more affordable AI, based on a custom chip that enables memory tiering. SambaNova claims its chips enable DeepSeek-R1 to run in just one datacentre rack, compared to the 40 that would be required if GPUs were deployed.
This is just a snapshot of what’s been happening over the last four weeks or so. Committing to a GPU-accelerated AI server purchase or multi-year reserved GPU instance cloud agreement may well lock you into expensive technical debt.