Nvidia's Groq 3 chip reshapes AI inference race

Nvidia unveiled the Groq 3 language processing unit at GTC 2026 in San Jose on Monday, marking the first chip to emerge from its $20 billion licensing and talent deal with AI inference startup Groq, which was struck on Christmas Eve last year.

The speed matters. Just twelve weeks from deal to product announcement, Nvidia proved it can move fast when it needs to. The real question is not how quickly Nvidia moved, but why it felt compelled to spend more than any previous deal in company history to acquire Groq's assets.

The answer lies in a fundamental shift happening in artificial intelligence. Groq's processors focus on AI inferencing, or running AI models; it's what happens when you type something into OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini and get a response. That distinction matters. Nvidia's graphics processing units are multipurpose and can both train and run AI models, but as the AI market moves toward running models, ensuring the company has a dedicated inferencing chip has become paramount.

The challenge Nvidia faced was real. While its GPUs dominate the training phase of AI development, competing architectures have been gaining ground in inference, where speed and cost efficiency determine whether AI systems can be deployed profitably at scale. Nvidia's Groq deal is the largest in a wave of inference-focused acquisitions that swept through the semiconductor industry in 2025. In June, AMD acquired the engineering team from Untether AI, a RISC-V inference chip developer, after the startup shut down, and Nvidia itself paid over $900 million for networking startup Enfabrica's team and IP in September. Meta acquired custom-chip startup Rivos in October, and Intel attempted to buy SambaNova for a reported $1.6 billion, but the talks collapsed.

What makes the Groq 3 LPU different comes down to chip architecture. Groq's approach to accelerating inference relies on interleaving processing units with memory units on the chip. Instead of relying on high-bandwidth memory situated next to GPUs it leans on SRAM memory integrated within the processor itself. This design greatly simplifies the flow of data through the chip, allowing it to proceed in a streamlined, linear fashion. The practical upshot: While the Rubin GPU has a memory bandwidth of 22 terabytes per second, at 150 TB/s the Groq 3 LPU is seven times as fast.

Nvidia is not deploying this as a standalone product. Nvidia is launching its Groq 3 LPX platform, a server rack powered by 128 individual Groq 3 LPUs. When used together with Nvidia's Vera Rubin NVL72 rack the company says customers could see 35x higher throughput per megawatt of power and 10x more revenue opportunity. The two systems work together: Rubin GPUs handle the computational heavy lifting of processing input prompts, while Groq LPUs take over the latency-sensitive work of generating responses token by token.

The manufacturing partnership with Samsung is crucial to scaling this. Nvidia is working with Samsung to bring the third generation LP30 chips to market, which Nvidia CEO Jensen Huang said in his opening keynote presentation at the GTC 2026 conference would happen in the second half of this year, and very likely in the third quarter. Samsung has ramped production from roughly 9,000 wafers to about 15,000 wafers as output shifts from samples to commercial manufacturing, with AWS announcing at GTC that it will deploy Groq 3 LPUs alongside more than one million Nvidia GPUs as part of an expanded partnership.

The strategic rationale extends beyond simple competitive advantage. Huang drew a direct parallel to Nvidia's 2020 Mellanox acquisition. Mellanox built the networking technology that became the foundation of NVLink and InfiniBand. Groq, in Huang's framing, extends Nvidia's architecture the same way: an acquired company's technology is absorbed into the platform as a core component rather than being operated as a stand-alone product.

Whether Nvidia's infrastructure plays dominate inference as thoroughly as they do training remains an open question. AMD and Intel have both been investing in AI accelerators, while cloud hyperscalers such as Google, Amazon, and Microsoft have been ramping up internal chip development. Competition is real. But by integrating Groq's technology into its own stack, Nvidia has reduced the risk that inference becomes a place where rivals gain ground.

Deployment matters now more than benchmarks. Organisations building AI systems will make choices based on what's available, cost-effective, and proven in production. Nvidia has just signalled it will not cede that ground without a fight.