Nvidia might be on an incredible high right now as the world rushes headlong into generative AI, but this might not last for long, according to Pete Warden.
Warden is a former research engineer at Google after the firm he founded was acquired by Google in 2014. And oh, he is now the chief executive officer of a firm that is currently crowd-funding a standalone AI-in-a-box solution the size of a Raspberry Pi.
Why Nvidia is winning now
The reason Nvidia is currently at the top of the hill, according to Warden, is because “almost nobody” is running large machine learning (ML) apps. Instead, the focus is on training as businesses rush to figure out what they can do with AI, which Nvidia’s GPUs are optimal for.
“Outside of a few large tech companies, very few corporations have advanced to actually running large-scale AI models in production,” noted Warden. The main costs right now are around dataset collection, hardware for training, and salaries for model authors; the focus is on training, not inference.
Also, it doesn’t help that alternatives all pale in comparison to Nvidia: “Using an Nvidia GPU is a lot easier and less time consuming than an AMD OpenCL card, Google TPU, a Cerebras system, or any other hardware,” he wrote.
“The software stack is much more mature, there are many more examples, documentation, and other resources, finding engineers experienced with Nvidia is much easier, and integration with all of the major frameworks is better.”
And finally, Nvidia’s CUDA software stack has had decades of investment. When paired with Nvidia’s high-performance GPUs, Warden says AI researchers can train new models from scratch in “about a week” – a combination he asserts as simply unbeatable right now.
From training to inference
The situation will change over time, though, as generative AI matures and the need for GPUs to train new models dips. And this is inevitable, as there are only a finite number of AI researchers.
In contrast, as generative AI makes its way into every application out there, the need for inference will eventually match and exceed the need for training. Invariably, priorities will shift towards reducing inference costs.
“There’s some point in the future where the amount of compute any company is using for running models on user requests will exceed the cycles they’re spending on training,” he wrote.
Warden believes the focus will turn towards CPUs for inference, using his experience in 2013 where 100 m1.small AWS servers were used to run inference across millions of images.
While it is very hard to split training workloads across multiple systems due to reasons around the interconnectedness of GPUs, the situation is quite different for inference on CPUs. The result? “This makes an army of commodity PCs very appealing for applications relying on ML inference.”
The road ahead
The result is that traditional CPU platforms such as the x86 and ARM processors will be the beneficiaries. Warden thinks specialized hardware won’t work given the latency they introduce; AI-centric instructions will find their way into CPUs instead.
“I expect CPUs to gain much more tightly integrated machine learning support, first as co-processors and eventually as specialized instructions, like the evolution of floating point support.”
Not everyone agrees though. In response to a post on LinkedIn by Warden, a customer engineer named Andre Bossard commented: “I would assume Nvidia stays the leader in hardware acceleration for a long time. Not just because of the hardware they sell now, but also because they outspend and out-hire the competition globally when it comes to R&D.”
In other news though, a new report earlier this month notes that Nvidia has added new software that can double the inference performance of Nvidia’s top-end H100 GPU – which at launch offers four times the performance of the A100.
Whatever the future brings, it does appear Nvidia is doing all it can to stay in the lead. You can read Warden’s blog here.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto/mouu007