Why AlphaZero Loses at Nim: AI's Hidden Weakness

DeepMind's AlphaGo and its generalised successor AlphaZero seemed to have cracked a fundamental problem in artificial intelligence. By playing games against itself, the system taught itself to defeat world champions at chess, Go, and shogi. Each victory appeared to herald a new era where self-play training could unlock superhuman performance in any domain.

Then researchers discovered something unsettling. A game so simple that children can learn it in minutes routinely defeats AlphaZero. That game is Nim, an ancient pastime where two players take turns removing objects from piles until none remain. The losing player is the one forced to take the last object.

Chess pieces arranged for the start of a chess game — AlphaZero excels at complex games like chess, yet struggles with simpler mathematical challenges

Research published in Machine Learning by Bei Zhou and Soren Riis examined what happens when you train an AlphaZero-style system to play Nim. On a five-row board, the AI improved steadily over hundreds of games. Adding just one row caused learning to slow dramatically. By seven rows, the system had essentially stopped learning. When the researchers replaced the trained move-selection system with random moves, performance on a seven-row board became indistinguishable.

The reason exposes a crack in how AlphaZero actually works. Nim's winning strategy depends entirely on understanding a mathematical parity function: a calculation based on binary representation that instantly reveals which player will win if both play optimally. This is not the sort of pattern you can absorb through observation and association. It requires symbolic reasoning; it requires discovering a rule.

AlphaZero learns by playing thousands of games and training its neural network to associate board positions with winning probability. This approach is devastatingly effective when the connection between board state and victory emerges naturally from patterns. But Nim forces the system to extract and apply an abstract mathematical principle. The neural network, no matter how many games it plays, cannot reliably compute the parity function.

The implications extend beyond a children's game. The researchers identified similar issues lurking in chess and Go. Certain endgame positions in chess require long chains of precise moves whose value only becomes apparent at the very end. Go, too, likely contains board configurations where the winning strategy cannot be discovered through association. These situations are rare enough in chess and Go that AlphaZero's other strengths compensate. In Nim, there is no hiding.

What makes this discovery particularly significant is its relevance to real-world applications. Researchers increasingly use AI systems for mathematics, where symbolic reasoning dominates. A system that cannot learn to apply a parity function may struggle with more sophisticated mathematical reasoning. Nim's mathematical elegance has made it a testbed for understanding fundamental limits.

Zhou and Riis conclude that "AlphaZero excels at learning through association but fails when a problem requires symbolic reasoning that cannot be implicitly learned from correlation between game states and outcomes." This distinction matters. The gap between association and symbolic reasoning may mark a boundary that self-play training alone cannot cross.

The discovery does not diminish AlphaZero's achievements. It instead clarifies what those achievements actually represent. The system has not achieved general game-playing ability; it has mastered games where strategy emerges from patterns. For problems requiring abstract principle, a different approach may be necessary.