A new compiler patch offers a rare glimpse into how AMD plans to squeeze more efficiency from its next-generation graphics architecture. The update to LLVM, the open-source compiler infrastructure used by GPU makers, introduces two changes that hint at architectural improvements for RDNA 5: a new instruction called V_FMA_F32 and a revised instruction format called VOPD3.
The problem AMD is addressing is not new. The next generation of Radeon GPUs from AMD are expected to be a significant upgrade over RDNA 4, and one of the issues Team Red seems to be tackling is dual issue execution, the GPU's ability to execute two instructions in the same cycle. AMD's cards have had this feature since RDNA 3, but strict pairing rules meant that compilers couldn't always take advantage of it, limiting theoretical peak performance.
AMD is apparently adding a new instruction format called "VOPD3" that is designed to better interface with the dual issue VALU (Vector Arithmetic Logic Unit; shader unit), and it should be more lenient, making it easier for the compiler to use dual issue execution. The existing system worked only with simpler 2-operand instructions, which made it difficult for compiler software to find compatible pairs. VOPD3 will expand this to 3-operand instructions, so it would be able to support operations like fused multiply-add (FMA).
Why does this matter to gamers and developers? FMA instructions are also important when it comes to neural rendering, so things like upscaling and frame-gen tech can also get a boost here, even if the hardware itself is not more performant since dual issue execution improves efficiency regardless. This approach represents a shift in thinking. Rather than cramming more transistors onto a chip, AMD is trying to make existing silicon work smarter.
Dual-Issue VALU has two ALU lanes, allowing the GPU to execute two instructions per clock cycle; however, with recent generations, there was no effective way for game engine compilers to line up their code to optimize for dual-issue VALU. This essentially meant that, even if hardware capability was present, RDNA 4/3 could not effectively group tasks. The practical result was that benchmarks often measured only half the theoretical performance these GPUs should deliver.
Solving this requires coordination between hardware and software. For two instructions to be executed in parallel, they must be in an exact matching format and avoid certain dependencies. Many shader programs and compiler outputs have not been able to reliably meet these requirements until now. VOPD3 loosens those constraints, giving compiler software more flexibility to find instruction pairs that can run in parallel.
The engineering challenge here is fundamental. GPUs do enormous amounts of arithmetic, and every clock cycle counts. RDNA 5 will reach theoretical peak performance by leveraging the Dual Issue VALU as intended. If this effort succeeds, it could improve gaming framerates and AI workload performance without requiring a more advanced manufacturing process or higher power consumption.
When will consumers see RDNA 5? Based on a new rumor, it is claimed that AMD plans to release its next-gen RDNA 5 GPUs by H2 2027, which means that in 2026, we wouldn't see any new consumer GPUs from Team Red, marking a major disappointment for gamers. The architecture will likely debut first in professional and semi-custom applications, including the semi-custom "Orion" APU for the next-gen PlayStation 6 console and "Magnus" APU for the next-gen Xbox console, alongside a new family of RDNA 5-based Radeon RX series graphics cards.
This compiler patch is a signal that AMD's engineers are serious about extracting value from architectural features that competitors like NVIDIA have implemented differently. It is not flashy marketing material, but it demonstrates the kind of unglamorous work required to compete at the high end of GPU design. Whether it translates into real-world gains depends on whether game engines and software vendors actually use the new capabilities that VOPD3 enables.