Google DeepMind has moved to solve a problem that has bedevilled the artificial intelligence industry for years: nobody actually knows how to measure progress toward artificial general intelligence, and everyone is using different yardsticks.
DeepMind wants to tighten up that definition. Their paper, "Measuring Progress Toward AGI: A Cognitive Taxonomy," lays out a scientific approach to evaluating AI's general intelligence. The framework draws from psychology and neuroscience, identifying 10 core cognitive abilities crucial for AGI. These include perception, generation, attention, learning, memory, reasoning, metacognition, executive functions, problem-solving, and social cognition.

As OpenAI pushes toward AGI with its next-generation models, Microsoft pours billions into AI infrastructure, and Meta open-sources increasingly capable systems, the industry lacks consensus on what these milestones actually mean. Everyone's racing toward AGI, but there's no agreed-upon finish line. This lack of standardisation has real consequences. If major AI labs adopt DeepMind's cognitive framework for their own progress reports, it creates comparable metrics across OpenAI's GPT series, Anthropic's Claude, and Google's own models. Right now, every lab uses different tests, making it nearly impossible to compare capabilities objectively.
Google is partnering with Kaggle to launch a hackathon, inviting the research community to help build the evaluations needed to put this framework into practice. The hackathon encourages the community to design evaluations for five cognitive abilities where the evaluation gap is the largest: learning, metacognition, attention, executive functions and social cognition. A total prize pool of $200,000 is on offer: $10,000 awards for the top two submissions in each of the five tracks, and $25,000 grand prizes for the four absolute best overall submissions. Submissions are open March 17 through April 16, and results will be announced June 1.
The hackathon approach reflects a pragmatic recognition of a fundamental challenge in AI research. People strongly disagree on AGI's definition: some define it by its performance on benchmarks, others by its internal workings, its economic impact, or vibes. By crowdsourcing benchmark development through Kaggle, DeepMind gains both credibility and practical expertise. By opening benchmark development to the global AI community, Google ensures the framework isn't just internal metrics tailored to make its own models look good. Independent developers building tests adds credibility and diverse perspectives on what AGI capabilities should include.
Whether other major laboratories and companies adopt these standards remains uncertain. Whether other major labs adopt these standards remains to be seen, but the conversation around how we measure AGI progress just got a lot more concrete. As AI capabilities accelerate, having agreed-upon metrics for what progress actually looks like isn't just academically interesting; it's essential for safety, governance, and understanding what we're actually building. DeepMind hopes that by doing something to help measure progress toward AGI, it can "move the conversation around AGI from one of subjective claims and speculation toward a grounded, measurable scientific endeavor."
The framework's emergence reflects broader shifts in how AI development is being evaluated. The European Union's AI Act, California's proposed legislation, and federal efforts all struggle with the same question: how do you regulate systems when you can't objectively measure their capabilities? A credible measurement framework could become useful not only for researchers but for policymakers grappling with oversight.
For more information, see Google's official announcement and the Kaggle competition page.