At QCon London this week, Anthropic's AI reliability engineering team explained why Claude excels at finding issues but still makes a poor substitute for a site reliability engineer, constantly mistaking correlation with causation.
Alex Palcuie is a Member of Technical Staff on Anthropic's AI Reliability Engineering team and spent 8 years as an SRE on Google Compute Engine, including a stint on the incident response team for Google-wide outages. His story illustrates both where artificial intelligence shines in operations and where human expertise remains irreplaceable.

Where AI succeeds: observation at scale
Claude excels at the observation part of incident response, reading logs at the speed of I/O and not getting bored, something no human can match at scale. In a real incident on New Year's Eve when Claude Opus 4.5 was returning HTTP 500 errors, Palcuie opened Claude Code and asked it to look; the AI wrote a SQL query and within seconds had the answer: an unhandled exception in the image processing class.
But Claude's analysis did not stop there. It identified the failing requests, checked the accounts that sent them, and discovered 200 accounts all sending 22 images simultaneously. The AI then found 4,000 accounts all created at the same time, most dormant. Palcuie reflected that having engineers on call is "a tax on humans because our systems are not good enough to look after themselves." Without Claude's rapid pattern recognition, he would have marked the issue as a bug and missed the fraud entirely.
Where AI fails: root cause analysis
The story takes a darker turn with the second anecdote. Anthropic's key-value cache, used for performance, is gigabytes in size and fragile. When it breaks, monitoring shows many more requests, and for a long time, Palcuie kept asking Claude what had happened. Each time, Claude blamed capacity: "request volume increase, this is a capacity problem, you need to add more servers."
The answer was wrong. Claude "will get wrong correlation versus causation. It's like a new joiner on the team; they will think it's a capacity problem, when actually you lost your cache." This is why "we can't trust LLMs for incident response."
When Claude produces postmortem reports, it delivers "an 80 percent story that's pretty, it's readable and convincing," but "it's really bad at root causes" and "doesn't know the history of your system, especially if your system has been there for ten years." Real incidents rarely stem from a single cause. They emerge from layers of decisions, processes, and system history. Claude lacks the context and experience to see those connections.
The knowledge tax
Palcuie emphasised the importance of SREs "that have been burnt before, they have the scar tissue," and worried that if AI is used more, "will we have our skills atrophy?" This concern mirrors what software developers already face as AI generates code at high velocity.
There is a broader economic principle at play. The Jevons Paradox, "the favourite paradox in the AI industry," explains how technological improvements that increase efficiency can paradoxically lead to greater overall consumption. In software, easier tooling means more code gets written, complexity grows, and systems fail in more sophisticated ways. Fewer incidents would lead to more incidents.
The reality on hiring
If Anthropic truly believed Claude could handle incident response, the company would not be actively hiring for reliability engineering roles. The AIRE (AI Reliability Engineering) team partners with teams across Anthropic to improve reliability across the most critical serving paths, every hop from the SDK through the network, API layers, serving infrastructure, and accelerators and back. The company is recruiting multiple positions, suggesting they see no near-term path to automating away the work.
The broader context matters. Recent months have seen real enterprise demand for Claude surge as developers realised the promise of AI agents was becoming real, but those models are only useful when available, and despite acknowledging the need to improve reliability, they're still falling down at an alarming rate.
Pragmatism, not hype
Palcuie ended on an optimistic note, saying "The models are the worst today that they'll ever be," but the overall message is clear: do not leave SRE to AI and keep training reliability engineers because they will be needed in future. He suggested that maybe "AI agents can simplify and manage the complexity, maybe do what we've collectively learned in our industry, but that's a big if."
This is honest engineering leadership. Rather than promoting automation hype, Palcuie and Anthropic are acknowledging that Claude is a tool for specific tasks, not a replacement for human expertise. You can hand a log to Claude. You cannot hand it your business. The systems keeping the internet running still need humans who have been burnt before, who remember why decisions were made, and who understand when a rising error rate is a real problem or a symptom of a deeper one.