AI assistants that can book your flights, manage your inbox, and execute code on your behalf sound like the logical next step in personal computing. For millions of early adopters already using tools built on large language models (LLMs), that step has already been taken. The problem is that nobody seems to have thought very carefully about what guardrails should come with it.
Into that gap steps IronCurtain, a new open-source security framework designed specifically to stop autonomous AI agents from taking actions their users never intended. According to reporting by Wired, veteran security engineer Niels Provos is working on a new technical approach designed to stop autonomous AI agents from taking actions you haven't specifically authorised. The project is a direct response to a pattern that has become alarmingly familiar in recent months.
AI bots have been reported mass-deleting emails they were instructed to preserve, writing hostile content over perceived snubs, and launching phishing attacks against their own owners. These are not science fiction scenarios; they are documented failures of systems that were handed too much access and too little constraint. IronCurtain aims to neutralise the risk of an LLM-powered agent "going rogue", whether through prompt injection or the agent gradually deviating from the user's original intent over the course of a long session.
How the Containment Works
Instead of allowing AI agents unlimited access to the user's system, IronCurtain ensures that the agent will not interact with it directly, and that its intended actions will first be analysed by a separate trusted process. The technical architecture reflects a well-established principle in cybersecurity: never trust, always verify. Once the user gives it an instruction, the agent writes TypeScript code that runs inside a V8 isolated virtual machine, and issues typed function calls that map to MCP tool calls, which are requests an AI sends to external tools through the Model Context Protocol so they can do things.
These tool-call requests are forwarded to a trusted process, an MCP proxy that acts as a policy engine, and will decide whether each call should be allowed, denied, or escalated to a human for approval. The decisions of this policy engine rely on a "constitution": a set of guiding principles and concrete guidance written in plain English by the user and translated into a security policy by IronCurtain.
The plain-English approach is central to the project's appeal. An IronCurtain policy could be as simple as: "The agent may read all my email. It may send email to people in my contacts without asking. For anyone else, ask me first. Never delete anything permanently." IronCurtain takes these instructions, turns them into an enforceable policy, and then mediates between the assistant agent in the virtual machine and the model context protocol server that gives LLMs access to data and other digital services to carry out tasks.
Critically, results flow back through the trusted process to the agent, which is never allowed to access the user's filesystem, sensitive credentials such as OAuth tokens, API keys, or service account secrets, or environment variables. It is also prevented from accessing and modifying its own policy files, audit logs, or configuration. The last point matters considerably: an agent that can rewrite its own rules is no more constrained than one with no rules at all.
A Problem the Industry Has Been Slow to Address
AI agents typically hold users' credentials, process untrusted input from emails and web pages, and execute code with the user's full permissions. The existing security model is effectively "hope nothing goes wrong." That is a characterisation Provos offers on the IronCurtain project page itself, and it is one that independent security researchers largely corroborate.
Well-known cybersecurity researcher Dino Dai Zovi, who has been testing early versions of IronCurtain, identifies a specific failure mode in existing systems. "What a lot of the agents have done so far is, they've added permission systems that basically put all the burden on the user to say 'yes, allow this,' 'yes, allow that,'" Dai Zovi says. "Most users are going to start to tune out and eventually just say, 'yes, yes, yes.' And then after a little while, they may dangerously skip all permissions and just grant full autonomy. With something like IronCurtain, capabilities, like deleting files, can actually be outside the reach of the LLM, where the agent can't do something no matter what."
This problem of "approval fatigue" is well documented at the enterprise level too. One of the biggest contributors to AI agents going rogue is overprivileged access, which often results from simple human error or fatigue. Administrators, pressed for time, start approving permissions by default rather than reviewing each one. The OWASP GenAI Security Project, which released its Top 10 for Agentic Applications in December 2025 as a resource to help organisations identify and mitigate the unique risks posed by autonomous AI agents, lists goal hijacking, identity abuse, and rogue autonomous behaviours among the primary threats facing deployments today.
Open Source as a Feature, Not Just a Funding Model
IronCurtain is still in development, and Provos describes it as an early research effort. The code has been released publicly so developers and security researchers can test the approach and suggest improvements. The open-source model is a genuine strategic choice here, not merely a funding workaround. By exposing the framework to independent scrutiny before any commercial deployment, the project invites exactly the kind of adversarial testing that tends to expose flaws in security architectures.
Provos himself is candid about the limits of what the project can claim. "There is a strong tension between secure and high utility. Humans are terrible at expressing their precise intent in prompts, so the agent has to guess. The boundary between what was intended and what was not is inherently blurry. My goal is to keep the agent from straying too far across that boundary into clearly unintended territory." That kind of epistemic humility is, frankly, more reassuring than any claim of airtight security could be.
What This Means for Australian Organisations
For Australian businesses and individual consumers assessing whether to deploy AI agents in their workflows, IronCurtain's emergence signals something important about where the industry stands. The technology itself has raced ahead of the governance frameworks needed to deploy it responsibly. The rapid adoption of agentic AI systems has forced regulators, standards bodies, and industry groups to rethink governance, resulting in a growing push to formalise AI security frameworks and agent governance standards tailored to autonomous systems. Australia's own Australian Cyber Security Centre and the broader work being done through the Department of Industry's AI policy framework have not yet produced specific guidance on agentic AI, leaving Australian organisations largely on their own when assessing the risk.
The counterargument, and it deserves fair hearing, is that locking down AI agents too aggressively risks negating their utility entirely. There is a strong tension between secure and high utility, and critics of overly restrictive frameworks argue that smothering new technology with red tape before the risk profile is fully understood simply transfers competitive advantage to jurisdictions with lighter regulatory touch. The balance matters: too restrictive and the agent becomes useless; too permissive and you're inviting disaster. That is not a false dilemma; it reflects a genuine engineering challenge that IronCurtain's architecture is attempting, with some ingenuity, to resolve.
What makes IronCurtain worth watching is not that it has solved the problem. IronCurtain is a research prototype, not a consumer product, and Provos hopes that people will contribute to the project to explore and help it evolve. Its value lies in the model it proposes: constrain at the infrastructure level, not just at the model level; let users express intent in natural language rather than code; and keep humans meaningfully in the loop for high-stakes decisions. Whether that model scales, and whether it holds up under real adversarial conditions, is a question the security research community will need to answer. For now, it represents the kind of serious, unglamorous engineering that the AI industry has been conspicuously short of.