Meta AI labelling fails during armed conflict, says Oversight Board

Meta's systems for detecting and labelling deepfakes are failing at their core purpose. During times of armed conflict, when synthetic content threatens public safety most acutely, Facebook and Instagram cannot respond quickly enough to protect users from misinformation.

That damning assessment comes from the Meta Oversight Board, the semi-independent body that guides the company's content policies. The board's conclusion arrives after examining a specific case: an AI-generated video of alleged damage in Haifa during the 2025 Israel-Iran war. Meta took no action despite the video being flagged, according to the Oversight Board.

The board overturned Meta's decision to leave the video online without a "High Risk AI" label, noting that while the content did not warrant removal for imminent physical harm, its inauthenticity should have been clearly flagged.

The gap between promise and performance reveals a fundamental challenge in platform governance: the speed at which synthetic media spreads now outpaces the defences built to contain it. Synthetic media can be generated in minutes, is increasingly indistinguishable from authentic footage at a glance, and can be translated and redistributed across language barriers in hours; for a platform operating at Meta's scale with billions of daily active users, the gap between how fast false content spreads and how quickly moderation systems respond is not a technical footnote—it is a public safety gap.

The problem intensifies during active conflict. Meta has invested heavily in AI-based detection tools and expanded its network of third-party fact-checkers, but the Oversight Board's assessment suggests these investments have not produced systems capable of operating effectively under wartime conditions, when the volume and velocity of synthetic content spikes sharply.

The stakes extend beyond this single conflict. AI-generated content related to the Iran-Israel conflict has taken disinformation to an industrial level, with human rights researchers noting that while other conflicts have seen recycled images and fake live streams, the sophistication of synthetic content in this conflict is unprecedented.

Meta's response has centred on the C2PA standard, a system designed to embed metadata identifying AI-generated images. The company joined the Coalition for Content Provenance and Authenticity and began rolling out labels across Facebook, Instagram and Threads. Yet independent investigations suggest the approach is breaking down in practice. When users upload images to Instagram, LinkedIn, or Threads, platforms strip out the C2PA metadata anyway; even OpenAI admits the metadata is incredibly easy to strip, to the point that online platforms might do it accidentally.

There is a case for maintaining measured scepticism here. Labelling systems remain preferable to removal, which risks suppressing legitimate speech under the guise of fighting misinformation. Yet the Oversight Board's findings raise a harder question: if Meta cannot label synthetic content reliably during active conflicts, when the stakes for public safety are highest, what confidence should users place in these systems during calmer periods?

For content moderation at scale, the board suggests a shift toward what might be called structural responsibility. Rather than relying on detection and labelling alone, platforms may need to address the underlying conditions that allow synthetic media to propagate so rapidly during conflict.

The Oversight Board itself may evolve its approach accordingly. An Oversight Board member said the board's mandate will possibly be less individual case-based and more structured to make broad-based reforms and recommendations as AI proliferates. That shift reflects a recognition that labelling is no longer sufficient; what is required instead is institutional change at the platform level.

For Australian users consuming international news during global crises, the implications are clear. Trust in what appears on Meta's platforms cannot depend on algorithmic detection or user-facing labels. Instead, readers must develop their own critical filters, seek verification from authoritative sources, and recognise that no single platform can protect them from synthetic content at scale. Meta has acknowledged the problem. Whether it can solve it is another matter entirely.