Measuring the Road to AGI: What Google DeepMind Is Really Building

Everyone talks about AGI, nobody can really define it. Google DeepMind just dropped a cognitive framework to measure progress toward so‑called “artificial general intelligence”.

Here’s the interesting part: it’s not just another corporate slide deck full of buzzwords. It’s a reasonably solid grid for tracking, step by step, how we go from “nice chatbot” to “system that beats top humans on most useful tasks”.

And if you’re a founder, this is not just research candy. It’s a tool to understand where we actually are, what’s likely to show up in the next 3–5 years, and how to prepare so you can automate as much of your operations as possible before your competitors do.

In this article, we’ll:

break down DeepMind’s framework without the academic fluff,
see where current models really stand (GPT‑4, Gemini, Claude…)
look at the limits and blind spots of the framework,
and most importantly: turn it into a concrete strategy for your business.

---

1. The core problem: everyone says “AGI”, nobody measures it

You’ve heard it all:

“AGI is already here”
“AGI is 50 years away”
“AGI is just a marketing term”

The real issue behind these hot takes is simple: no shared metric. If we don’t know what we’re measuring, anyone can say anything.

DeepMind tries to fix that with a simple idea:

> Measure progress toward AGI like you’d measure a human’s cognitive abilities: > - their performance on tasks, > - the generality of what they can do, > - their autonomy in the real world.

This is a very “cognitive science” approach: treat AI as an agent that perceives, reasons, acts and learns — not just a token‑predicting machine.

---

2. DeepMind’s framework in plain English: performance × generality × autonomy

The paper (“Levels of AGI for Operationalizing Progress on the Path to AGI”) defines three key axes:

Performance (depth) – how does the AI compare to humans?

- below human, - average human, - expert human, - above the best humans.

Generality – is the AI strong on one narrow task (e.g. chess) or on a wide spectrum of tasks (coding, contracts, strategy, learning new domains, etc.)?

Autonomy – does the AI:

- just answer prompts, - or can it plan, act, correct itself, and learn with minimal supervision?

Based on this, DeepMind defines levels of AGI‑ness.

The 6 performance levels

Simplified:

Level 0 – No useful AI

Systems that can’t beat hard‑coded rules.

Level 1 – Emerging

Roughly non‑specialist human level on some tasks. This is where most big models sit today: they can do a bit of everything, but they’re not yet reliable or expert.

Level 2 – Competent

At or above median human on many tasks.

Level 3 – Expert

Around the top 10% of humans on a broad set of tasks.

Level 4 – Virtuoso

Roughly top 1% human, consistently.

Level 5 – Superhuman

Beats all humans in a given domain.

Then you cross that with generality: are you superhuman at one game, or expert across 50 types of real‑world tasks?

Finally, you add autonomy: is this just a model you query, or an agent that can chain actions, call tools, trigger workflows, and learn over time?

That combination gives you a way to say: “we’re at this level on the path to AGI”.

---

3. Where do current models actually sit?

DeepMind and others more or less converge on this picture:

Models like GPT‑4, Claude, Gemini 2.0/3.1 are in the Emerging AGI bucket.

They show early generality: code, text, images, some logic, some planning.

But they are not competent or expert on most tasks, especially when:

- long‑horizon reasoning is needed, - large context and memory matter, - or they must act in the real world (tools, APIs, systems).

Some numbers to anchor this:

On abstract reasoning benchmarks like ARC‑AGI, top systems hover around 20–25% accuracy. Very far from expert humans.
On complex software tasks, some reports show models can complete work that would take a human dev ~2 hours in roughly 50% of cases. Impressive — but not yet reliable or general.

On the other hand, in very narrow domains, we already see Expert/Virtuoso levels:

AlphaGeometry 2: solves International Mathematical Olympiad geometry problems with performance close to medal‑level humans.
AlphaEvolve: discovers improved algorithms in scientific/maths problems, beating state‑of‑the‑art on ~75–80% of a 50‑problem set.

So:

> We already have superhuman AI in narrow pockets, but not yet general, autonomous intelligence.

For you as a founder, this boils down to one thing:

No, AGI is not here yet.
Yes, we already have more than enough to automate 30–60% of many cognitive jobs.

---

4. What DeepMind doesn’t say loudly (but you should care about)

The framework is clean and well‑designed. But it has blind spots.

4.1. Benchmarks are still weak proxies for the real world

Most evaluations:

are done on static datasets,
test tasks that are far from daily business reality,
don’t measure continuous reliability.

In a real business, you don’t care that a model scores 85% on some academic benchmark if, in production:

it hallucinates on a customer contract,
forgets half the context in a support ticket,
or writes borderline‑legal emails.

Your real benchmark is:

> How much human time do I save at equal or better quality, on a specific process?

No academic framework measures that properly yet.

4.2. “Compensability” is misleading

DeepMind’s grid tends to aggregate capabilities: you can be amazing in one area and terrible in another, and still look “okay on average”.

Some researchers push a different idea: coherence.

> A true AGI shouldn’t be brilliant at math and terrible at following basic instructions or planning simple tasks.

For your company, that’s exactly the pain point:

a model that’s insane at writing, but awful at following requirements, is dangerous.

4.3. The real blockers: memory, world models, uncertainty

Even Demis Hassabis (DeepMind’s CEO) admits current models struggle with:

short memory,
fragile understanding of the real world,
poor handling of uncertainty (they hallucinate instead of saying “I don’t know”).

That means if you want useful AI today, you must:

box it into well‑defined processes,
connect it to structured knowledge bases,
and monitor what it does.

---

5. What this framework changes if you’re actually building

You’re not DeepMind. Your job is not to “reach AGI”. Your job is to:

> Cut costs, grow revenue, and free up brain‑time for what actually moves the needle.

DeepMind’s framework is useful for one key thing: projection.

5.1. The next 0–3 years: the age of “useful Emerging AGI”

From 2025 to ~2028, it’s realistic to expect:

models that are more reliable,
better tooled up (agents, tools, APIs, long‑term memory),
still far from general Expert‑level AGI.

For you, the question is not “will we have AGI?” but:

> Which processes can I already push down to “Emerging/Competent” level with today’s AI?

Typical examples:

Level 1–2 customer support

- 40–70% of tickets can be handled by an AI agent wired into your knowledge base. - Gains: 30–60% reduction in human support load.

Document preparation (proposals, contracts, reports)

- AI drafts, human reviews. - Gains: 50–70% less time per document.

Prospecting and qualification

- Scraping, enrichment, personalized outreach, lead scoring. - Gains: 2–3x more prospects touched at similar quality.

Internal ops (SOPs, documentation, reporting)

- Auto‑generate SOPs from Loom videos or call transcripts. - Auto‑summarize weekly metrics into readable reports.

All of this is doable today with “Emerging” models + solid engineering.

5.2. 3–7 years: toward “Competent AGI” on many tasks

If DeepMind is roughly right and we hit broad Competent‑level AGI before 2030, you’ll see:

agents that can run end‑to‑end projects (e.g. launch a full marketing campaign),
AI that genuinely learns your business over months,
systems that cut 70–90% of time on some cognitive tasks.

The companies that will benefit first will be the ones that already:

have their data structured,
have clear, documented processes,
and a pro‑automation culture.

The rest will still be arguing on X about whether it’s “real AGI” or not.

---

6. How to use DeepMind’s framework pragmatically

Instead of fantasizing about Level 5 Superhuman, use this framework as a process design tool.

Step 1 – Map tasks to human‑level requirements

List your key operations and ask a simple question for each:

Does this task really need expert‑level performance?
Or is competent enough?
Or is intern‑level fine if it’s reviewed?

Examples:

Answering “where is my invoice?” tickets → intern‑level.
Writing a SEO blog post → competent, with review.
Negotiating a $500k contract → probably human expert only.

Step 2 – Match with current AI levels

For each task:

if Emerging is enough → near‑term automation is realistic,
if you need Competent → partial automation with human supervision,
if you need Expert/Virtuoso → AI is a copilot, not a pilot.

You stop dreaming about “full automation” and start doing smart automation.

Step 3 – Layer in autonomy

Ask yourself:

Should the AI just propose (drafts, suggestions)?
Or can it act (send emails, create tickets, push code)?

Always start with:

proposal + human validation,
then, once you have quality metrics, gradually enable autonomy (e.g. AI fully handles simple cases, routes complex ones to humans).

---

7. Conclusion: AGI is an horizon, not an excuse to wait

DeepMind’s cognitive framework for measuring progress toward AGI is genuinely useful:

it clarifies performance levels,
it highlights the importance of generality and autonomy,
it gives a common language to track progress.

But if you’re a founder, the worst move right now is to sit back and “wait for real AGI”.

While some people fantasize about Level 5 Superhuman, others are already:

automating 30–60% of their operations,
cutting costs,
scaling without hiring 10 extra people.

If AGI shows up around 2030 like DeepMind suggests, it will just amplify a gap that started years earlier.

You can either be in the camp that gets disrupted by AGI, or in the camp that’s already building an AI‑powered operational machine.

At Deepthix, we’re firmly in the second camp.

Want to automate your operations with AI? Book a 15-min call to discuss.