If you've been following the AI coding boom, you've probably noticed the narrative getting more complicated lately. On one hand, AI systems are performing almost superhuman feats of code archaeology—finding security vulnerabilities that have lurked undetected in software for decades. On the other, the code that AI generates is proving to be a quality disaster. The contradiction reveals something important about where we actually are with AI development.
Start with the good news. A significant share of the world's code will likely be scanned by AI in the near future, given how effective models have become at finding long-hidden bugs and security issues. AI models like Anthropic's Opus 4.6 have significantly improved at finding new, high-severity vulnerabilities across vast amounts of code, even discovering vulnerabilities in open-source software that had gone undetected for decades. Anthropic's red team used Claude Opus 4.6 to find over 500 vulnerabilities in production open-source codebases. This is genuinely impressive stuff, especially for security teams working with stretched resources.
AI systems can now continuously analyse source code repositories to identify vulnerabilities and propose targeted patches by monitoring commits and changes to codebases. The systems work at a speed humans simply cannot match. But here's where the conversation gets messier.
When researchers at CodeRabbit analysed 470 real-world open-source pull requests in late 2025, they found something sobering. AI-generated code creates 1.7 times more issues than human code. And these aren't all minor nitpicks. AI pull requests show 1.4–1.7 times more critical and major findings, including business logic mistakes, incorrect dependencies, flawed control flow, and misconfigurations.
The security vulnerabilities are particularly concerning. AI-generated code was 1.88 times more likely to introduce improper password handling, 1.91 times more likely to make insecure object references, 2.74 times more likely to add XSS vulnerabilities, and 1.82 times more likely to implement insecure deserialisation than human developers. These are not theoretical problems; they're the kinds of vulnerabilities that create real exploits.
The irony is sharp. AI excels at finding bugs it—or other AI—might create. But when developers actually use these tools to generate code, they're introducing problems at scale. AI-authored changes produced 10.83 issues per pull request, compared to 6.45 for human-only pull requests. That's not a rounding error; it's a fundamental quality gap.
So what accounts for the paradox? Part of it is selection: Teams may develop false confidence in AI tools, skipping human oversight for complex business logic, architectural decisions, and contextual code understanding that AI cannot fully grasp, leading to subtle bugs and design flaws only experienced developers can catch. AI can pattern-match and autocomplete brilliantly, but it struggles with domain-specific knowledge and the non-obvious constraints that make code actually work in production.
AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organisations must actively mitigate. That last part is key: this is not an unsolvable problem, just an unacknowledged one. Teams deploying AI code generation need guardrails, not just speed.
The meta-lesson here is that AI capability isn't binary. These systems are powerful at specific, constrained tasks like scanning for known patterns of vulnerability. They're weaker at the open-ended creative work of writing code that needs to handle edge cases, integrate with business logic, and anticipate failure modes. The best current approach pairs both: use AI to find vulnerabilities and accelerate initial drafting, but treat the output as a draft that needs serious human review before production.
Tools like Aardvark will result in the discovery of increasing numbers of bugs, requiring sustainable collaboration to achieve long-term resilience. That sustainability means accepting that faster isn't automatically better if the code that reaches production is buggier. For Australian teams adopting these tools, the pressure to keep pace with international development needs to be balanced against the engineering discipline required to maintain code quality.