UK's AI chatbot hits speed-accuracy wall at 90% accuracy

The faster your AI chatbot runs, the less accurate it becomes. This iron law of machine learning is now playing out in real public infrastructure, and it's forcing the UK government to make uncomfortable choices about what citizens actually want.

The Government Digital Service has reported answer accuracy increasing from 76 percent to 90 percent, a significant jump since early trials. More than 10,000 participants took part in two public pilots over the past 18 months, asking around 26,000 questions about government services, from tax to benefits to visas. On the surface, this looks like unambiguous progress. The chatbot, called GOV.UK Chat, is now scoring more highly than mass-market AI assistants when answering government-related questions, a meaningful achievement for a high-stakes service.

The catch: reliability came at a cost. The chatbot now takes an average 10.7 seconds for an answer, which may not sound catastrophic until you consider that users browsing government services expect the kind of responsiveness they get from Google. The latest versions of frontier models have been more powerful but slower than previous versions, according to the team's own analysis. The more sophisticated the AI, the more computation it demands.

Why does this matter? Because public services aren't optional customer experiences where users will tolerate friction if the answer is good enough. Citizens often don't have the luxury of waiting while a chatbot deliberates; they're trying to find out if they qualify for a benefit, what documents they need for a passport renewal, or whether their business problem fits into a government programme. At 11 seconds, some users will bail before the response appears.

Here's where the story gets interesting, though. The government team openly acknowledges the dilemma and is not pretending accuracy and speed aren't in tension. "For us, accuracy is the most important thing, and consequently GOV.UK Chat responses are slower than we'd ideally like," the team wrote. That's a choice to privilege correctness over responsiveness, which is defensible in government. Getting tax advice 10 seconds slower is better than getting it wrong fast.

The proposed solution is answer streaming, where the first part of an answer appears to users before the answer is fully written. Users would see something appear almost immediately and then watch it complete, creating the perception of speed while the system still works through its full reasoning. The trade-off is substantial: this requires redesigning how some of their safety guardrails work, which will be a substantial bit of work.

On the security front, the system performed well. During the two pilots, there were 508 attempts to trick the AI system to provide an inappropriate or harmful response, and these jailbreak attempts were successfully prevented by existing safeguards. The chatbot also knows its limits. The team introduced the ability for Chat to ask users clarifying questions where the original questions are ambiguous, and the answer rate for in-scope questions is now 88%.

The rollout timeline suggests the government is confident enough to proceed despite the speed issue. GOV.UK Chat will begin rolling out within the GOV.UK mobile app, with website integration planned for later in 2026. At a deeper level, this project reveals something important about public-sector AI deployment: the constraints are different from commercial services. When a customer service chatbot takes five seconds, you lose sales. When a government chatbot takes ten seconds but gets the answer right, you might prevent a citizen from making a costly mistake.

Whether 10.7 seconds is the right balance remains an empirical question that only live deployment will answer. User satisfaction in the pilots reportedly hovered around 73 percent finding the assistant useful and 64 percent satisfied, figures that suggest room for improvement. But those numbers don't tell us whether faster answers at 85% accuracy would have scored higher overall, or whether users would have preferred speed over correctness.

The deeper lesson is that no chatbot, no matter how sophisticated, can escape the fundamental trade-off between depth of reasoning and response time. The UK government has chosen to optimise for accuracy in public services, at least for now. Whether that calculus holds once real users are waiting for answers in high-pressure moments remains to be seen.