When ‘Evidence-Based’ Starts Meaning Everything… and Nothing

May 17, 2026

Spend enough time in literacy education and you begin to notice that almost everybody claims research is on their side. Publishers do it. Literacy programmes do it. Professional development providers do it. Schools do it. Increasingly, the language of research permeates literacy discourse online, conference presentations, policy documents, and CPD sessions.

Terms like research-based and evidence-based are now used almost ubiquitously in the field. At times, these phrases function almost like educational quality stamps of approval. Once the label appears, the programme or approach begins to feel automatically credible.

The difficulty is that these terms are not interchangeable. They describe genuinely different relationships with research and evidence and, for teachers trying to navigate literacy instruction, understanding that difference is becoming increasingly important.

Not All Evidence Claims Are Equal

Research-based is the more modest claim. A programme described as research-based may have drawn on existing research findings during its development, but drawing on research is not the same as being shown to work. A literacy programme might reference research on oral language, motivation, or the value of discussion, all legitimate findings, while still having no independent evidence that the programme itself improves reading outcomes. The research may have informed the programme’s construction but it did not evaluate its effectiveness.

In practice, research-based has become one of the most overused and least scrutinised phrases in literacy education. It is often applied so broadly, and with so little consistency, that it can function as a kind of all-purpose defence, a way of claiming a relationship with research without being held accountable to any particular standard of evidence.

This does not mean research-based approaches are inherently poor or should be dismissed. Far from it. Many thoughtful programmes are developed with careful attention to research literature and theoretical understanding. But we need to be honest about what the term actually means. A programme can be thoughtfully research-based and still have no independent evidence that it improves outcomes for children.

Evidence-based is the stronger claim, and the one that deserves the most weight. It means the programme or approach has been evaluated through rigorous research designs such as randomised controlled trials, longitudinal studies, meta-analyses, and systematic reviews and demonstrated to improve outcomes. Not simply inspired by research, but tested. There is a meaningful difference between a programme built with research in mind and a programme that has been independently shown to work.

It is also important to say clearly that not every aspect of literacy education sits neatly within an evidence-based framework, and that is not necessarily a problem. Teaching is a human profession, not a laboratory procedure. There are aspects of literacy instruction where the evidence is exceptionally strong and specific, such as phonics instruction and certain intervention approaches. In these areas, teachers should feel confident leaning heavily on robust evidence.

But there are also aspects of literacy education that involve creativity, relationships, responsiveness, and professional judgement in ways that are far more difficult to reduce to tightly controlled research studies. The choice of a novel for a particular class, the atmosphere created around storytelling, the opportunities teachers create for discussion, curiosity, humour, and connection, all matter too, even if they cannot always be captured neatly through experimental designs.

So the goal is not to eliminate teacher judgement from literacy instruction. It is to understand where strong evidence exists, where professional interpretation is required, and where the two must work together. Teachers do not need to become researchers, statisticians, or methodological experts. But they do need to think critically about the claims, programmes, and messages that are sold to them. The language of research can sound extremely persuasive, particularly when wrapped in polished presentations, confident delivery, and references to ‘brain science’.

Part of teachers’ professional responsibility is learning to pause and ask: what kind of evidence actually sits underneath this claim? How strong is it? And does the confidence of the messaging match the quality of the evidence being presented?

What This Looks Like in Practice

The distinction between research-based and evidence-based is not abstract. It plays out in real instructional decisions every day.

Take phonics instruction. The evidence base here is extensive and robust. Decades of research, converging across randomised controlled trials, systematic reviews, longitudinal studies, and major national inquiries in the United States, United Kingdom, and Australia, arrive at the same broad conclusion: systematic, explicit phonics instruction is the most effective approach to teaching decoding for the vast majority of children, and particularly for those most at risk of reading difficulty. The National Reading Panel (2000), the Rose Review (2006), and the Australian inquiry led by Rowe (2005) all pointed toward the same conclusion. This is evidence-based in the fullest sense. It is not simply a matter of philosophical orientation or professional preference. The evidence is strong, replicable, and consistent across contexts and research designs.

But this is where the distinction between research-based and evidence-based becomes practically important, and where teachers need to be particularly alert. The conclusion that systematic, explicit phonics instruction works is not the same thing as saying that every programme marketed as a phonics programme is equally effective. Some phonics programmes have been independently evaluated through rigorous research and produce reliable outcomes for children. Others are more accurately described as research-based, meaning they have been developed in close alignment with what we know from cognitive and educational research about how reading develops, even if the programmes themselves have not yet been extensively evaluated through large-scale independent studies. That does not make them ineffective or unworthy of consideration. But it does mean we should be precise about the kinds of claims being made and the level of evidence currently available to support them.

This is where teachers and school leaders need to feel confident querying the research base underpinning particular programmes. What research informed the programme’s development? Has the programme itself been independently evaluated? Were the studies conducted by the programme developers or by independent researchers? Is the evidence drawn from robust instructional studies or from broader research about reading development?

Teachers and school leaders should understand that distinction when they encounter these terms in marketing materials, publisher descriptions, and CPD presentations. Choosing a phonics programme requires critical professional judgement. Asking whether a programme has been independently evaluated, what its scope and sequence looks like relative to the research, and whether its claims are supported by evidence beyond the developer’s own materials is the kind of informed professional decision-making good practice demands.

Even Within Evidence-Based, There Is a Hierarchy

There is one more layer worth understanding. The idea of a hierarchy within the evidence-base comes largely from evidence-based medicine, where researchers recognised that some forms of evidence provide much stronger grounds for decision-making than others. Over time, this thinking increasingly shaped conversations in education too.

At the top sit systematic reviews and meta-analyses. These synthesise findings across many individual research projects, weigh the quality of each, and draw conclusions from the body of evidence as a whole. They carry the most weight precisely because they are not relying on a single study.

Randomised controlled trials sit close to the top. They are the closest educational research can come to establishing that an approach caused an improvement, rather than simply being associated with one.

Below these sit well-designed longitudinal studies, cohort studies, and quasi-experimental designs. Valuable, but carrying somewhat less certainty.

Further down again sit smaller-scale studies, case studies, and expert opinion. Useful for generating hypotheses and informing thinking, but not sufficient on their own to justify adopting a whole-school programme or making significant instructional changes.

And at the very bottom, though they are often presented with great confidence, sit testimonials, anecdotal reports, and the experiences of individual schools or teachers. These are not without value. Professional experience matters. But ‘it worked in our school’ is not the same thing as evidence that something works, and conflating the two is where the field has repeatedly run into difficulty.

Why does this hierarchy matter for teachers? Because when someone presents evidence, the question is not simply whether evidence exists. It is what type of evidence, how much of it, how rigorously it was gathered, and whether it has been independently replicated. As Robert Slavin argued for many years, educational decisions should ideally be guided by the strongest available evidence rather than popularity, intuition, or tradition alone.

The ‘Snake Oil’ Problem in Literacy

This is where the work of Dr Holly Lane becomes particularly relevant. Lane is a reading researcher at the University of Florida Literacy Institute and co-author of UFLI Foundations, a structured phonics programme with a strong evidence base that has gained significant traction internationally, including in Ireland. She has spoken candidly about what she describes as the ‘snake oil’ problem in literacy education.

Snake oil relies on persuasive marketing, compelling testimonials, and grand promises rather than robust proof. Literacy education is not immune to this. Programmes can appear highly scientific through the use of neuroscience terminology, selective citations, and references to ‘brain-based learning’, even where the evidential foundations are weak, incomplete, or contested. In literacy education, references to neuroscience can sometimes create an aura of scientific certainty that exceeds what the instructional evidence can actually support.

The packaging can be extraordinarily convincing, and the language of research, deployed confidently and liberally, is part of that packaging. Knowing what these terms actually mean, and understanding the hierarchy of evidence behind them, is one of the most practical tools teachers have.

Asking Better Questions

The goal is not for teachers to become researchers or statisticians, but to become informed consumers of research claims, people who know enough to push back, probe, and ask the questions that matter.

Before adopting a programme, resource, intervention, or instructional approach, teachers might ask:

Is this programme research-based, meaning it was developed with reference to research, or is it genuinely evidence-based, meaning it has been independently tested and shown to improve outcomes?
What specific evidence supports the claims being made?
How strong is that evidence?
Has the programme been independently evaluated, rather than simply assessed by the people who developed it?
Were the studies conducted with children similar to those in our own context?
Compared to what was the programme shown to be effective? No intervention? Another programme? Existing classroom practice?
Are the claims being made proportionate to the quality and quantity of evidence available?
Does the confidence of the marketing match the strength of the research?
Are we selecting this because it is genuinely supported by evidence, or because the language surrounding it sounds persuasive and authoritative?

Final Thoughts

These are not always comfortable questions, especially when a programme is already embedded in a school, when CPD providers recommend it, or when publishers are promoting programmes heavily.

But they are the right questions because literacy instruction carries extraordinarily high stakes. Reading is not simply another school subject. It is the gateway through which children access almost everything else in education.

Teachers who understand what these terms actually mean, and who feel comfortable to ask what kind of evidence is really on the table, are better placed to make decisions that genuinely serve the children in front of them.

Discussion about this post

Daniel Paulson

I agree with the points made in this essay, but I think it needs to be extended. O’Sullivan thinks that teachers need to be better consumers. I believe it makes a case for highly trained teachers who are given autonomy and resources to meet high expectations. This means that teachers must be better trained, and possibly a fundamentally different model of teacher development needs to evolve.

Teachers need to have better problem-solving ability as described by Commons in the Model of Hierarchical Complexity. Frienacht (2017), who studied under Commons, describes it well and lists what people and teachers can do at each stage. The lower stages follow Piaget, but Commons extended Piaget's work into adult problem-solving, creating 19 stages in all. What is most important for teacher development is at the 10th, 11th, and 12th stages. We need teachers who can solve problems at stages 11 (the formal stage) and 12 (the systematic stage). Frienacht states that teachers at stage 10 can teach reading and math, given a guidebook. At stage 11, teachers can differentiate instruction. At stage 12, teachers can integrate an array of methods, research, and materials to create an effective program where all students thrive.

O'Sullivan asks teachers to be "informed consumers of research claims." But that's actually a fairly modest cognitive demand — it's largely Stage 10 work in MHC terms: applying systematic thinking to evaluate claims against criteria. What I am pointing toward with Stages 11 and 12 is something qualitatively different. A Stage 11 teacher isn't just evaluating whether a program meets evidence standards — they're holding the tension, as Jung describes, holding two conflicting ideas until one emerges as the clear choice. This is holding between the evidence hierarchy and the irreducible complexity of a particular child in a particular classroom in a particular community. They can work with competing frameworks simultaneously without collapsing into either rigid rule-following or "anything goes" relativism.

We need master teachers who can mentor, along with a facilitator, teachers in learning how to read and keep abreast with research in learning, cognitive science, curriculum theory, child development, and an array of pedagogical methods and models. There are master teachers in schools now who, with mentorship and facilitation, could become instructional leaders guiding other teachers. The teachers would meet to discuss classroom problems and, with the help of others, develop more sophisticated solutions. After implementing the strategy, the teacher would reflect on and report the outcome. This is collaborative problem identification, solution design, implementation, reflection, and reporting back — it is essentially action research embedded in professional practice. Importantly, it generates local evidence that can be read against the research literature, which is exactly the kind of integration O'Sullivan calls for but doesn't fully articulate how to achieve.

Schools generally have top-down management, making it difficult to establish the structural conditions for collaborative problem-solving. The master teacher/facilitator model requires identifying and protecting those people, giving them time, status, and hopefully a reduced classroom load. That runs directly against how most school systems currently deploy their best teachers: either keeping them classroom-bound because they're too valuable there, or pulling them into administration where they lose touch with practice.

O'Sullivan’s framework is essentially about better gatekeeping at the point of program selection — asking sharper questions before adoption. I see a more fundamental problem: even a genuinely evidence-based program, validated through rigorous RCTs, was tested on a population distribution and produces an average effect. The child in front of the teacher is never that average.

Teacher training programs are weak at developing higher-level cognitive functioning, with the excuse that it is too complex for undergraduate students. I have brought this up to the education program leadership, who just ignore it. That response from the university is itself revealing — and somewhat self-defeating. The excuse that higher-order thinking is "too complex for undergraduates" is essentially an argument that the people being trained to develop complex thinking in children cannot themselves be expected to develop it. There's a profound inconsistency there that deserves to be highlighted.

It also reflects a broader problem in how teacher education programs are structured. Most are built around competency checklists and method transmission — here is how you write a lesson plan, here is how you manage a classroom, here is a phonics framework. These are Stage 9 and 10 demands at best: following systematic procedures, applying rules consistently. The program is essentially designed around what's easiest to teach and assess at scale, not around what children really need from teachers.

There's also a kind of institutional self-protection in that response. If teacher education acknowledged that what's currently being produced is insufficient to meet classroom demands — particularly for the most complex learners — it would call the program itself into question. Admitting that undergraduates could and should be developing Stage 11 reasoning would require fundamentally restructuring how programs are designed, assessed, and staffed. That's expensive, difficult, and threatening to existing ways of doing things.

The irony is that the "too complex for undergraduates" claim is empirically questionable. Commons' research suggests that the upper stages aren't age-gated in any simple way —

Frienacht, H. (2017). What is the model of hierarchical Complexity? https://metamoderna.org/what-is-the-mhc/

No posts

Jen’s Substack

Discussion about this post

Ready for more?