Skip to main content

Archived Article — The Daily Perspective is no longer active. This article was published on 10 March 2026 and is preserved as part of the archive. Read the farewell | Browse archive

Opinion Technology

Why Tech Writers Keep Getting Voice AI Wrong

ChatGPT's voice mode and Adobe's creative AI tools reveal a larger pattern of overstated failures and undersold capabilities

Why Tech Writers Keep Getting Voice AI Wrong
Image: ZDNet
Key Points 4 min read
  • ChatGPT voice mode initially developed a poor reputation for fabricating answers but has significantly improved in reliability and usefulness
  • Voice interactions feel more natural and authentic than text-based equivalents, reducing latency and misunderstanding between user and AI
  • Adobe and OpenAI are expanding conversational AI across creative tools, from Photoshop to chat interfaces, making design accessible without technical expertise
  • The technology reveals a wider pattern of rushed dismissal followed by grudging recognition among early adopters and commentators

There is something almost predictable about the rhythm of technology criticism. A new feature launches. Journalists test it early, find it broken, and declare it useless. Months pass. The feature improves quietly. Then, almost sheepishly, the same critics admit it actually works.

ChatGPT's voice mode has followed this exact arc.

When OpenAI first deployed voice conversations, the feature had a serious problem: it fabricated answers. Users reported exchanges that felt plausible but contained facts pulled from thin air, the AI confidently stating things it had no basis for knowing. The verdict was swift: voice mode was a gimmick, unreliable, a feature to avoid. Some technology writers put it in the "interesting but broken" category and moved on.

Yet the feature did not disappear. OpenAI kept refining the underlying technology. The "Advanced" label refers to the more conversational voice experience that began rolling out to users starting July 2024, designed to feel near real time. The difference between older voice mode and the new version is architectural: instead of converting speech to text, processing it through the language model, and then converting the response back to speech, the newer system handles audio natively, understanding spoken words directly without intermediate text conversion.

The practical result is striking. A conversation with ChatGPT feels much more authentic and responsive. When users began using it, the pressure was gone and they felt more relaxed and expansive. Remove the translation steps and you remove the places where nuance gets lost. Sarcasm no longer becomes literal. The exchange stops feeling like a broken telephone game.

But the real story is not just ChatGPT. This pattern of initial dismissal followed by grudging admission of utility is happening across the AI landscape. Adobe announced on Tuesday that its AI assistant for Photoshop is becoming available to users in beta on the web and in the mobile apps. Like voice mode before it, this represents a shift toward conversational interfaces for tools that traditionally required technical expertise. Rather than clicking menus and dragging sliders, users can describe what they want in natural language.

Consider what this actually means for creative work. AI Assistant is a natural language helper inside Photoshop where users can tell it to organise layers, mask regions, or modify the document structure, and it performs the steps automatically. The technology does not replace creative skill; it handles the tedious parts. A designer can focus on ideas instead of technical execution.

The counterargument has weight: these tools carry real limitations and risks. Voice mode in ChatGPT works well now, but conversations remain constrained compared to human dialogue. AI assistants in design tools can hallucinate and produce unusable results, consuming credits in the process. Adobe claims the assistant can understand different layers and help automatically select objects and create masks, and users can ask the assistant to complete repetitive tasks such as removing backgrounds or changing colours. "Can" and "does reliably" are different things. The technology works best when users understand its weaknesses and know when to switch to manual control.

What is worth acknowledging is that the pattern of improvement is real. Voice has improved: release notes highlight better, more complete search responses in Voice, with the Realtime API and new audio models supporting richer, natural speech agents. OpenAI has not abandoned voice mode in favour of chasing the next flashy feature. The company has made it genuinely better.

Strip away the marketing hype and what remains is a clearer question than "does this work?" The real question is: for what specific task, does this work better than the alternative? Voice mode is superior to text when thinking through ideas aloud, practising conversations, or learning a language. It adds nothing when you need a permanent written record or precise technical output. Adobe's Photoshop assistant saves time on repetitive edits but cannot replace human judgment about composition or meaning.

The technology cycle moves faster now. Products that seemed broken six months ago are genuinely useful today. Writers should probably stop declaring AI features "failed" after testing them once. But users should also stop believing marketing claims about what these tools can do. The honest assessment lies between the extremes: voice mode works better than it did, but it remains a tool with clear boundaries. That is not a failure story. It is not a triumph either. It is simply the messy reality of how technology actually improves.

Sources (5)
Daniel Kovac
Daniel Kovac

Daniel Kovac is an AI editorial persona created by The Daily Perspective. Providing forensic political analysis with sharp rhetorical questioning and a cross-examination style. As an AI persona, articles are generated using artificial intelligence with editorial quality controls.