Intel's benchmarking transparency problem

When a benchmarking tool loses faith in the integrity of its own test results, something has gone seriously wrong. This week, that something arrived in the form of Intel's new Binary Optimisation Tool, a piece of software that Geekbench now views with such suspicion it is flagging entire categories of results as potentially worthless.

Primate Labs, the company behind Geekbench, has stated that all Geekbench 6 results using Intel's new CPU "may be invalid" due to Intel's new iBOT technology. The warning applies to Intel's desktop Core Ultra 5 250K Plus, 250KF Plus, and Core Ultra 7 270K Plus, and several Panther Lake based Core Ultra Series 3 chips.

Let's be real: this is a mess nobody wanted. Intel's iBOT tool restructures and streamlines compatible software to improve performance on Intel's newest CPUs, allowing software to run with higher IPC (Instructions Per Cycle) and make better use of caches, prefetchers, and CPU pipelines. On the surface, that sounds reasonable. The problem is what happens next.

According to Geekbench founder John Poole, Geekbench 6 workload scores on the chips increase by up to 40 percent with iBOT enabled, with overall scores improving by up to 8 percent. Those are not trivial numbers. They are the kind of performance gains that can flip the perceived winner in a direct CPU comparison.

Here is where the core problem emerges: Intel does not have any public documentation on the techniques the Binary Optimisation Tool uses to optimise code, making it difficult to determine how effective iBOT's techniques are when applied to a variety of different applications, which makes it impossible for Primate Labs and its userbase to understand how iBOT is boosting performance compared to benchmarks that run without it.

This is not a technical quibble. Benchmarking only works if you are comparing apples to apples. If an Intel CPU hits a 3,700 single-core score using iBOT, while an AMD chip hits 3,600 natively, the numbers on the screen suggest Intel is the winner. However, this no longer remains a fair comparison. One chip is running the 'standard' test, while the other is running a 'tailored' version of that test.

The catch that really matters to consumers: Geekbench posted a warning to its users that Intel's latest tool can't be trusted at this time, and there's no way to identify when the tool is enabled or disabled during a benchmark run. As of now, Primate Labs cannot detect if Intel users are using iBOT, so every result from these processors now carries a validity warning.

To Intel's credit, the company is acutely aware of the optics. Intel told Tom's Hardware that it is cautious about rolling out the feature and wants to avoid any claims of playing dirty tricks to look better on benchmarks. That caution might be justified. In 2024 SPEC invalidated Intel benchmark results using a certain Intel compiler due to "unfair" optimisations, and in 2009, Intel's ICC was found to be crippling performance on AMD CPUs by deliberately removing all optimisations for the competing CPU architecture.

None of this means the Core Ultra 200S Plus chips are actually slower than Intel claims. Intel's tech delivers higher performance without skipping work or changing the quality/end result of the workloads completed. What it means is that consumers and reviewers cannot currently trust a direct Geekbench comparison between these new Intel processors and anything else. That uncertainty is exactly what Intel did not need heading into a competitive refresh cycle.

The path forward is straightforward, if politically uncomfortable for Intel. Geekbench is seeking deeper and more technical insight, with the goal to make sure these optimisations are general-purpose and well-documented. Transparency is the only move that restores credibility.