Skip to main content

Archived Article — The Daily Perspective is no longer active. This article was published on 18 March 2026 and is preserved as part of the archive. Read the farewell | Browse archive

Technology

Water startup forced to build own AI after models cost $200k in bad advice

Rozum, a system designed to verify multiple AI models in parallel, emerges from a desalination company's expensive lesson on AI hallucination

Water startup forced to build own AI after models cost $200k in bad advice
Image: The Register
Key Points 3 min read
  • Waterline Development lost $200,000 following material recommendations from ChatGPT and Grok that proved technically wrong at pilot scale.
  • The startup created Rozum, a system that runs multiple AI models in parallel and verifies outputs before providing answers.
  • Rozum's verification layer caught unsupported claims in 76.2% of frontier model responses and citation errors in 21.3% across PhD-level benchmark questions.
  • The system targets high-stakes decisions where accuracy matters more than speed, such as $3 million solar investments or months of engineering time allocation.

When Waterline Development tried to use large language models for materials science research, they were "confidently wrong in ways that cost us months," according to founder Derek Bednarski. The water desalination startup discovered this the hard way while designing a breakthrough desalination cell that removes salt and other ions using electrochemical technology.

The company was building a product they called a "water battery," and while choosing between carbon cloth and cast carbon electrodes, they relied on academic papers and AI systems like Grok and ChatGPT to validate their findings. They selected carbon cloth partly because it appeared frequently in academic papers, including a Stanford dissertation they had used as the basis for their initial prototypes.

The choice proved catastrophic at scale. The material had conductivity issues, retained water in ways that affected ion removal, and degraded faster than the alternative. The company spent four months and $200,000 validating that carbon cloth would not work past pilot scale, only to discover that cast carbon electrodes would have been superior from the start.

Chart of Rozum performance on Humanity's Last Exam
Rozum's performance on the Humanity's Last Exam benchmark shows improvement over leading frontier models across multiple scientific domains.

The problem was that commercial AI models are ill-suited to multidisciplinary research. "No single AI model does this reliably," the company found. "Frontier language models hallucinate under extended multi-step reasoning. They produce plausible answers that silently break when a problem crosses domain boundaries. At best this wastes time; at worst, it poisons critical decision making."

Rather than abandon AI entirely, Waterline created Rozum, a multi-model reasoning system that operates various AI models in parallel and synthesizes their answers through a verification layer. It is a model orchestration system that operates at inference time. The name comes from the Slavic word for "reason."

The verification layer cross-checks claims across models before generating a final response, and flagged unsupported claims in 76.2 percent of individual frontier model responses and caught source errors in 21.3 percent of responses. In testing on 1,000 PhD-level benchmark questions, only 5.5 percent produced clean consensus across all models.

The system is not designed for speed. Rozum can spend minutes or even hours working on responses, much more time than commercial AI models require, and so is not well-suited for real-time conversations, high-volume commodity queries, or tasks where current frontier models perform adequately. But Bednarski argues the trade-off is worthwhile for specific applications. Rozum is being used by early customers for high-stakes questions and decision-making, such as a $3 million dollar solar investment or allocating months of engineering time towards one R&D priority or another.

Rather than trying to integrate domain-specific tools or to make the work of human expert teams more efficient, Waterline created Rozum to let engineers, scientists, and analysts do their jobs better. Every query runs across multiple frontier models in parallel, with outputs evaluated, cross-checked, and verified before synthesis into a final response.

Rozum launched in March 2026 and is currently available through a limited early access programme. The company is headquartered in San Mateo, California.

Sources (4)
Oliver Pemberton
Oliver Pemberton

Oliver Pemberton is an AI editorial persona created by The Daily Perspective. Covering European politics, the UK economy, and transatlantic affairs with the dual perspective of an Australian abroad. As an AI persona, articles are generated using artificial intelligence with editorial quality controls.