Skip to main content

Archived Article — The Daily Perspective is no longer active. This article was published on 19 March 2026 and is preserved as part of the archive. Read the farewell | Browse archive

Technology

Kioxia's New Speed Machine Attacks AI's Memory Bottleneck

Storage Class Memory drives promise to break the data access barrier that's slowing down GPU workloads

Kioxia's New Speed Machine Attacks AI's Memory Bottleneck
Image: Toms Hardware
Key Points 3 min read
  • Kioxia's new GP Series SSD lets GPUs access flash memory as extended high-bandwidth memory, potentially multiplying available AI system memory
  • The drive achieves over 10 million IOPS using XL-Flash, a storage-class memory technology with 3-5 microsecond latency versus traditional SSDs' 40-100 microseconds
  • As AI models scale toward trillions of parameters, GPUs run out of on-board memory; this drive extends the memory hierarchy to support larger models and longer context windows
  • Evaluation samples arrive by end of 2026, with a companion 25.6TB CM9 drive for handling massive key-value caches in AI inference

Kioxia has announced the GP Series, a new SSD designed for AI systems that would allow GPUs to directly access flash memory as an extension of high-bandwidth memory, or HBM. If that sounds specialised, that's because it is. And if you've been following the AI race at all, you'll understand why this matters.

The problem is straightforward: modern GPUs have only so much memory attached to them. When AI models demand access to more data than fits in that on-board memory, performance tanks. AI models are rapidly scaling toward trillions of parameters while context windows expand to millions of tokens, driving an unprecedented growth in KV (Key Value) cache requirements. Architectures such as NVIDIA's Context Memory Storage (CMX) recognize the need to extend the memory hierarchy beyond GPU memory using high-performance storage. Kioxia's solution lets GPUs talk directly to flash storage fast enough that it feels like memory, not storage.

The performance numbers are where things get interesting. The CM9 series comes with Kioxia's XL-flash, which is designed to achieve over 10 million IOPS, a figure that is around three to four times greater than traditional datacenter SSDs. XL-Flash is made using SLC NAND flash (the fastest flash type available) that boasts a read latency range of just 3 to 5 microseconds. For context, traditional SSDs normally have a peak at 3 to 4 million IOPS and have read latencies in the 40 to 100-microsecond range. That latency difference is the gap between a GPU waiting an eternity for data and a GPU that can retrieve it in a blink.

The technology sits at the intersection of two trends. First, Nvidia's Storage-Next initiative aims to address growing demand for GPU-accessible memory as AI workloads become more data-intensive. Second, the industry is bifurcating. Some SSDs are optimised for capacity and throughput; these are optimised purely for speed and random access patterns, with higher IOPS, finer-grained data access (512 bytes), and lower power consumption per IO, compared with Kioxia conventional TLC SSDs.

Kioxia is tackling the problem in two parts. The GP Series handles the general memory extension problem. But there's also the KIOXIA CM9 Series PCIe 5.0 E3.S SSD, offering 25.6 TB TLC capacity with 3 DWPD endurance, provides the performance, capacity, and endurance needed to support these large-scale inference environments. This bigger drive targets key-value cache, the specialised data structures that large language models use during inference.

The real-world implication is cost and efficiency. Expanding the GPU's usable memory space allows access to larger data sets and improves GPU utilization by moving more data closer to compute resources. Instead of buying more GPU memory, which is expensive and space-constrained, data centre operators can add high-performance storage. It is a different trade-off, one that works better the further you push AI models toward their limits.

The catch is availability. Evaluation samples of KIOXIA GP Series will be available to select customers by the end of 2026. This is not a product you can buy today. It is an architecture that Kioxia and Nvidia have jointly specified, and it exists in prototype form right now. KIOXIA will be demonstrating the Super High IOPS SSD emulator and other technology innovations at NVIDIA GTC, booth 3522.

The technical question is whether these drives can actually sustain 10 million IOPS in real workloads, or whether, like so many storage marketing claims, the numbers are achieved only under laboratory conditions. The market question is whether data centre builders will trust a new storage category. But the strategic question is settled: as AI models get bigger and memory constraints get tighter, the architecture itself has to change. Kioxia is betting this is how it changes.

Sources (4)
Tom Whitfield
Tom Whitfield

Tom Whitfield is an AI editorial persona created by The Daily Perspective. Covering AI, cybersecurity, startups, and digital policy with a sharp voice and dry wit that cuts through tech hype. As an AI persona, articles are generated using artificial intelligence with editorial quality controls.