Nvidia Rubin Ultra: 1TB GPU Memory and the Race for AI

Nvidia on Monday demonstrated its next-generation tray for its data centre GPU known as Rubin Ultra, which is due to arrive sometime in 2027. In the semiconductor business, such demonstrations carry weight; they signal not merely intent but engineering maturity. The spec sheet itself tells you something crucial about where AI infrastructure is headed.

The Rubin Ultra features four compute chiplets and 1TB of HBM4E memory, making it the industry's first AI accelerator equipped with a terabyte of memory. That terabyte sits on a single GPU package. To contextualise: memory capacity has exploded from the A100's 80 GB of HBM2E to 1,024 GB of HBM4E for Rubin Ultra. The trajectory is unmistakable. AI models are growing faster than Moore's Law can accommodate. The only answer is to stuff more memory onto single chips.

The real innovation lies not in the terabyte figure itself, which is more milestone than breakthrough, but in how Nvidia achieved it. HBM4 doubled the interface width from 1024 bits to 2048 bits, meaning that even at a modest 8 Gb/s data rate, HBM4 can deliver 2.048 TB/s of bandwidth per stack. The memory bottleneck that choked earlier systems has been front of mind for designers. The memory capacity of Rubin Ultra will reach 1 TB of HBM4E, delivering approximately 32 TB/s of bandwidth.

The architectural implications extend beyond the single GPU. The Rubin Ultra platform will use the company's new rack-scale design known as Kyber, which will integrate 144 GPU packages, greatly improving performance over the current NVL72 rack design. The new rack will enable Nvidia to put 144 GPU packages into one rack, which means that Kyber NVL144 systems based on Rubin Ultra GPUs will offer at least four times more performance than Oberon NVL72 based on 72 Rubin GPUs, as the system will double the number of GPU tiles per package and the number of packages.

That four-fold gain masks a deeper challenge: power and cooling. The configuration is projected to consume 3.6 kW, requiring an all-new cooling system for GPU packages and an all-new Kyber rack. Nvidia's answer is not just liquid cooling but a fundamental redesign of data centre infrastructure. By rotating compute blades vertically, like books on a shelf, Kyber enables up to 18 compute blades per chassis, while purpose-built Nvidia NVLink switch blades are integrated at the back via a cable-free midplane for seamless scale-up networking.

More striking still is the power delivery architecture. At GTC 2025, Nvidia exhibited an 800 V sidecar to power 576 of the Rubin Ultra GPUs in a single Kyber rack. This shift from traditional 54 VDC in-rack systems reflects a deliberate move toward efficiency at gigawatt scale. Over 150% more power is transmitted through the same copper with 800 VDC, eliminating the need for 200-kg copper busbars to feed a single rack.

Behind every chip sits a supply chain. Nvidia will still command the lion's share of HBM demand in 2027, driven by its aggressive roadmap, where Rubin Ultra alone pushes per GPU capacity to 1 TB. The vendors supplying this memory are well aware. SK Hynix became Nvidia's primary HBM supplier, a relationship that drove the company's market share gains. Micron Technology has made bold moves in the HBM memory race, achieving record-breaking performance with its 12-Hi HBM4 samples, offering 2.8TB/s bandwidth and 11Gbps pin speeds. Samsung, too, is competing aggressively. In late 2025, Samsung shipped HBM4 samples with 11Gbps pin speeds, matching Micron's performance records, and plans to begin mass production by the end of the year.

This is healthy. Competition between major DRAM manufacturers has historically driven innovation and prevented any single supplier from holding the market hostage. Yet the scale is formidable. The high bandwidth memory market is projected to expand from $4 billion in 2023 to $130 billion by 2033, and by 2033 HBM memory is expected to represent over 50% of the global DRAM market value.

Nvidia's Rubin Ultra demonstration is strategically timed. The company is signalling to customers, investors, and competitors what the post-Blackwell era will look like. It is also, implicitly, signalling to governments that semiconductor supply chains matter. The memory vendors racing to support these architectures are largely Korean and American manufacturers, with Taiwan's TSMC in the packaging and base-die role. This concentration of critical capability deserves scrutiny.

For customers weighing the decision to build or upgrade data centre infrastructure, the timeline matters. Rubin is targeted for 2027 and aims to double performance by moving from two compute chiplets to four, which is expected to raise FP4 inference performance to around 100 PFLOPS per GPU package. That is roughly two years away; for a hyperscaler, two years is tomorrow. The capital allocation decisions are being made today.

The core trade-off is simple: greater memory capacity per chip allows larger models and longer context windows, which matter as AI moves beyond inference toward reasoning and multi-turn reasoning. But the power, cooling, and power-delivery infrastructure required to support it scales non-linearly. Nvidia is betting that the performance gains justify the cost. The supply-chain resilience questions remain unanswered.