Kioxia SSD Solves AI GPU Memory Bottleneck with 10M IOPS

Kioxia has announced the GP Series, a new SSD designed for AI systems that would allow GPUs to directly access flash memory as an extension of high-bandwidth memory, or HBM. If that sounds specialised, that's because it is. And if you've been following the AI race at all, you'll understand why this matters.

The problem is straightforward: modern GPUs have only so much memory attached to them. When AI models demand access to more data than fits in that on-board memory, performance tanks. AI models are rapidly scaling toward trillions of parameters while context windows expand to millions of tokens, driving an unprecedented growth in KV (Key Value) cache requirements. Architectures such as NVIDIA's Context Memory Storage (CMX) recognize the need to extend the memory hierarchy beyond GPU memory using high-performance storage. Kioxia's solution lets GPUs talk directly to flash storage fast enough that it feels like memory, not storage.

The performance numbers are where things get interesting. The CM9 series comes with Kioxia's XL-flash, which is designed to achieve over 10 million IOPS, a figure that is around three to four times greater than traditional datacenter SSDs. XL-Flash is made using SLC NAND flash (the fastest flash type available) that boasts a read latency range of just 3 to 5 microseconds. For context, traditional SSDs normally have a peak at 3 to 4 million IOPS and have read latencies in the 40 to 100-microsecond range. That latency difference is the gap between a GPU waiting an eternity for data and a GPU that can retrieve it in a blink.

The technology sits at the intersection of two trends. First, Nvidia's Storage-Next initiative aims to address growing demand for GPU-accessible memory as AI workloads become more data-intensive. Second, the industry is bifurcating. Some SSDs are optimised for capacity and throughput; these are optimised purely for speed and random access patterns, with higher IOPS, finer-grained data access (512 bytes), and lower power consumption per IO, compared with Kioxia conventional TLC SSDs.

Kioxia is tackling the problem in two parts. The GP Series handles the general memory extension problem. But there's also the KIOXIA CM9 Series PCIe 5.0 E3.S SSD, offering 25.6 TB TLC capacity with 3 DWPD endurance, provides the performance, capacity, and endurance needed to support these large-scale inference environments. This bigger drive targets key-value cache, the specialised data structures that large language models use during inference.

The real-world implication is cost and efficiency. Expanding the GPU's usable memory space allows access to larger data sets and improves GPU utilization by moving more data closer to compute resources. Instead of buying more GPU memory, which is expensive and space-constrained, data centre operators can add high-performance storage. It is a different trade-off, one that works better the further you push AI models toward their limits.

The catch is availability. Evaluation samples of KIOXIA GP Series will be available to select customers by the end of 2026. This is not a product you can buy today. It is an architecture that Kioxia and Nvidia have jointly specified, and it exists in prototype form right now. KIOXIA will be demonstrating the Super High IOPS SSD emulator and other technology innovations at NVIDIA GTC, booth 3522.

The technical question is whether these drives can actually sustain 10 million IOPS in real workloads, or whether, like so many storage marketing claims, the numbers are achieved only under laboratory conditions. The market question is whether data centre builders will trust a new storage category. But the strategic question is settled: as AI models get bigger and memory constraints get tighter, the architecture itself has to change. Kioxia is betting this is how it changes.