Does AI Actually Help Children Learn? The Research — and How We Built Around It

Why Most AI Learning Games Don't Actually Teach — And How We Built One That Does

Written by: Pierre Lagrange

Published on March 12, 2026

Time to read 5 min

At Mrs Wordsmith, we use AI. We've been open about that. But "using AI" covers an enormous range of practices, and the difference that matters most is whether the AI does the thinking or the child does. The science is now clear: AI embedded without care leads to cognitive offloading — children complete tasks without ever internalising the knowledge. Performance goes up, but learning doesn't. We've spent considerable time working out how to build AI into a product where that can't happen — where the child does the work, and the AI doesn't do it for them. Here's what the research says, and how it shaped what we built.

There is a lot of noise about AI in education right now. Most of it is happening without much reference to the actual evidence — and the evidence, when you look closely, is more cautionary than the headlines suggest.

In June 2026, the Wang and Fan meta-analysis — the single most cited paper used to support the claim that ChatGPT improves student learning outcomes — was formally retracted by Humanities and Social Sciences Communications. The retraction notice cited discrepancies that undermine confidence in its conclusions. As learning scientist Carl Hendrick noted, a large proportion of the "AI improves learning" claims circulating in education policy and edtech marketing either cite this paper directly or cite papers that cite it. A companion meta-analysis (Liu, Xu and Xie, Frontiers in Psychology) found positive effects of generative AI on intellectual outcomes — but also detected significant publication bias for exactly those outcomes.

The honest summary: the evidence that AI improves learning exists, but it is weaker and more fragile than many people have claimed.

Now let's review the scientific finding that should be at the centre of every conversation about AI in education… but rarely is.

A randomised controlled trial published in Computers and Education: Artificial Intelligence tested AI assistance in a university programming course. Students with AI support completed tasks more successfully and reported less stress. But they showed no deeper conceptual learning. When tested independently of the AI, the gap was clear: performance and learning had come apart.

This replicates a Harvard tutoring study by Bastani and colleagues. The pattern is becoming robust: AI makes the task easier without making the learner more capable.

Cognitive offloading — using an external tool to do the mental work your brain would otherwise do — is not a fringe concern. A 2025 RAND survey of 1,214 US students found that most young people who used AI for homework were themselves worried about its effect on their critical thinking. The Brookings Institution's Centre for Universal Education, drawing on interviews in 50 countries and over 400 research studies, concluded that the risks of generative AI in children's education currently outweigh the benefits.

For anyone building educational games and learning tools for children — including us — this is the central design challenge. A product that makes a child feel like they are learning, while quietly doing the cognitive work for them, is not an education product. It is a performance product.

Wong and Qiu published a randomised controlled trial in Educational Psychology Review comparing free ChatGPT use with a "think first" protocol — students had to generate their own ideas independently before being given AI access. The results were clear. Free ChatGPT use boosted initial creativity scores, but when the AI was removed, those gains collapsed. The think-first group produced more modest initial gains but retained them independently.

The mechanism is the generation effect: producing an answer — even a wrong one — before receiving help creates retrieval and elaboration events that strengthen learning. The cognitive struggle that precedes AI use is what makes AI use educationally productive. This is not a case against AI. It is a case for sequencing. Struggle first. Generate something. Then bring in the tool.

A study by Florean et al. in Cognitive Research: Principles and Implications found an important asymmetry: removing a cognitive tool that students have been relying on damages their performance far more than giving them that tool in the first place improved it. Students with lower working memory suffered the most.

The implication is practical and serious. The decision to introduce a support tool is also, implicitly, a decision about what happens when it is withdrawn. If children build their understanding while depending on external assistance, the knowledge they construct depends on that assistance continuing to be available. Remove it — for an assessment, for a future year, for a new context — and the architecture is fundamentally fragile. Any scaffolding worth having is designed to fade. Students should practise without it regularly enough that their understanding is genuinely theirs.

Not all the news is cautionary. An NBER working paper by Oreopoulos and colleagues reporting results from a randomised controlled trial of Khan Academy in Indian schools found substantial learning gains. The key design feature was supervision: students used the platform during structured school time with teacher oversight, not independently at home.

The OECD's Digital Education Outlook 2026 draws a similar distinction, separating "fast AI" — cognitive outsourcing that boosts output while hollowing out learning — from pedagogically designed AI that keeps the student doing the thinking. A field experiment in Türkiye found GPT-4 access improved short-term performance by 48%, but students performed 17% worse once access was removed. The technology, in short, is a complement to teaching. The gains come from the combination.

Knowing what the research says is one thing. Building around it is another. WordLore Legends is our new speaking and listening game for children that applies these findings directly — an AI educational RPG for kids where the cognitive work stays firmly with the child.

We've never hidden the fact that we use AI. We use it to move faster, to explore more ideas, to handle tasks that do not require human creative judgment.

But the products we build for children are held to a different standard: the standard the research sets. We design for durable learning, not performed learning. We sequence practice so that children generate answers before they receive them. We build retrieval into how our games and activities work. We do not confuse a child completing a task with a child having learned something.

WordLore Legends combines the educational benefits of AI with our signature approach to entertainment. It's a Dungeons & Dragons-inspired RPG where Armie, the AI Game Master, sets the scene — a crumbling bridge, a locked door, a creature blocking the path — and then waits. The child has to construct a sentence using a word they selected from the field by typing it out or speaking it aloud. Only then does Armie respond and narrate what happens next.

The AI doesn't answer for them. It doesn't model the sentence first. It judges how well the child used the word and moves the story on accordingly. That sequencing — generation before feedback, cognitive effort before reward — is the whole design. It is also what we think makes the difference between a language learning game that entertains and one that genuinely teaches.

WordLore Legends is available now at mrswordsmith.com.

Carl Hendrick, The Learning Dispatch (June 2026) · Wang & Fan retraction notice, Humanities and Social Sciences Communications · Liu, Xu & Xie, Frontiers in Psychology · Bastani et al., Harvard · Wong & Qiu, Educational Psychology Review · Florean et al., Cognitive Research: Principles and Implications · Oreopoulos et al., NBER · OECD Digital Education Outlook 2026 · Brookings Institution Centre for Universal Education (2026) · RAND Corporation, American Youth Panel (2025).

← Go back