Enterprise software teams are discovering a painful truth: the speed gains from artificial intelligence-generated code come with a steep hidden cost. As organisations have raced to deploy AI coding tools over the past 18 months, they have measured success by the wrong benchmarks, leaving them exposed to failures that are now surfacing in production systems.
AI-assisted code generation produces 1.7 times more issues related to logical and correctness bugs compared to traditional development methods. But the problem runs deeper than bug counts. By 2026, 75% of technology decision-makers are projected to face moderate to severe technical debt from AI-speed practices. This debt compounds silently; code that passes unit tests and looks correct can fail catastrophically in production.
Dorian Smiley and Connor Deeks, founders of AI advisory firm Codestrap, have spent the last year watching organisations grapple with this reality. "No one knows right now what the right reference architectures or use cases are for their institution," Smiley said. "A lot of people are pretending that they know. But there's no playbook to pull from." The pair worked at global consultancy PwC before launching their own practice to guide companies toward a realistic AI strategy.
The measurement problem is critical. AI-generated code introduces 1.7 times more total issues than human-written code across production systems, with logic and correctness errors appearing 1.75 times more often in AI-generated code than in human code. Yet many organisations still track success using metrics like lines of code and pull requests, which tell them nothing about quality. "Lines of code, number of pull requests, these are liabilities," Smiley said. "These are not measures of engineering excellence."
One example drives the point home. When developers attempted to rewrite SQLite in Rust using AI, the code passed all unit tests and had the correct structural shape. Yet the AI-generated version was 3.7 times larger and produced code that looks right but may skip control-flow protections or misuse dependency ordering. The performance penalty made it unusable.
The liability problem is reaching critical mass. Deloitte's member firm in Australia paid the government a partial refund for a $290,000 report that contained alleged AI-generated errors, including references to non-existent academic research papers and a fabricated quote from a federal court judgment. This incident signals a broader risk: if consulting firms using AI at scale produce flawed work, clients will seek damages. Deeks noted the incentive misalignment is built in. "The partner wants more revenue and higher margin. You give them AI, what are they going to do? More work, less human work." That logic cuts corners on quality assurance.
Insurance underwriters are responding by withdrawing coverage. Both Deeks and Smiley report that major insurers are actively lobbying state-level regulators to carve AI-related workflows out of business liability policies. "Insurance underwriters are seriously trying now to remove coverage in policies where AI is applied and there's no clear chain of responsibility," said Smiley. One senior insurance executive confirmed this is real and widespread, yet few are discussing it publicly.
The market is tightening already. Smiley predicts quality failures will surface within eight to nine months for heavy AI users. Deeks sees a wave of lawsuits coming. Organisations are also applying pricing pressure: companies are demanding discounts from service firms when they learn AI tools were used in the work. AI tools amplify both the good and bad aspects of your engineering culture; with strong processes, clear coding patterns, and well-defined best practices, these tools can shine. But without those safeguards, the problems multiply.
The broader issue is that many organisations still believe AI coding "just works." It does not. As models improve, the code they produce is becoming increasingly verbose and complex, driving down the number of obvious bugs and security vulnerabilities at the cost of increasing the number of code smells; harder-to-pinpoint flaws that lead to maintenance problems and technical debt. These hidden defects are harder to spot in code review.
Deeks and Smiley argue the solution begins with honest conversation. Organisations need new metrics that track deployment frequency, lead time to production, change failure rate, and incident severity; the real markers of engineering excellence. They need to measure tokens burned per approved change and understand the true cost of AI-assisted development. Most critically, they need to stop pretending the technology solves problems it does not.
The next phase of AI in enterprise software will separate organisations that approached it with discipline from those that chased speed. The cost of the latter choice is becoming visible, and it is not cheap.