Advertisement

NVIDIA’s latest Blackwell-based GB300 GPUs are starting to show what they can do, and early results point to a massive jump in efficiency compared to the company’s previous generation. A recent CoreWeave benchmark shows just how far the new chips push AI performance when running one of today’s heaviest workloads.

In tests using the DeepSeek R1 model, just four GB300 GPUs matched the performance of 16 H100s. That works out to roughly six times the throughput per card, thanks to both raw hardware upgrades and a more efficient architecture.

Source: CoreWeave

The GB300 NVL72 platform can scale up to 37TB of memory (40TB max) with 130TB/s of memory bandwidth, allowing it to tackle extremely large AI models without the bottlenecks seen in older hardware. It also makes use of 4-way tensor parallelism (TP4), while the H100 required 16-way splits (TP16) to reach similar scale. Fewer splits mean less communication overhead, and NVIDIA’s fifth-gen NVLink and NVSwitch interconnects help keep latency low.

Source: CoreWeave

For businesses running large AI services, that means faster token generation, lower costs per inference, and a more straightforward path to scaling. CoreWeave noted that the efficiency gains translate directly into practical benefits for workloads such as reasoning models, where performance-per-watt and latency are critical.

Of course, the hardware won’t come cheap. Systems built on the GB300 NVL72 are expected to start at around $300,000, putting them firmly in enterprise territory. But for companies already investing heavily in AI infrastructure, the performance leap could justify the cost.

If these early numbers hold true as deployments expand, the GB300 could mark one of the biggest generational jumps NVIDIA has delivered yet — and a new baseline for high-end AI computing.

Don’t miss a thing! Join our Telegram community for instant updates and grab our free daily newsletter for the best tech stories!

For more daily updates, please visit our News Section.

Comments