Considering Running Records, and No, I Don’t Beat My Wife Anymore

Teacher question:

I’m confused. I've understood your message that we should stop obsessing about deriving an independent or instructional reading level. But I’ve also read that you feel that there is a place for F&P style running records (many schools still use them!). If a running record is producing a reading level, isn't this going against the idea that we should do away with the instructional level? Aren't there other problems with these tests, too (design problems, large standard errors, etc.)?

Shanahan responds:

You have it right, sort of. I thought my position on this was adroit and artful and you seem to find it to be abstruse and confusing. Hmmm… let’s take it step by step.

First, you are correct that I believe it is a big mistake to try to teach most kids reading at their supposed instructional level. The instructional level is meant to identify a text that will be maximally effective in teaching. You know the routine. The teacher is supposed to test students to identify this level by listening to their oral reading and asking questions. Different schemes have different criteria (90, 93, 95 percent word accuracy, 75 percent comprehension). No matter the criteria, placing kids in texts in this way does not boost their learning. There is, in fact, a growing body of evidence suggesting that it lessens learning.

Second, you are also correct that I claim it can be useful to collect this kind of information, and I admit that sounds at least a bit contradictory. Why measure the instructional level if you aren’t going to use that to choose the books you are going to teach?

My reasoning here has to do with what it means to NOT teach at the instructional level. I’ve come to believe that the instructional level is a device that results in both minimized learning and teaching. If kids are placed in texts that they can comprehend reasonably well, there isn’t much about reading that can be learned from those texts. The differences between their performance levels and the text demands are likely so small that the kids should be able to figure out what is unknown without much teacher help.

But what if the books are hard for the kids? That means they may not be able to accomplish high comprehension without some instruction and other teacher support. I think it can be helpful to a teacher to have some idea as to how much difficulty a text may pose for their students. The greater the gap between texts that can already be read reasonably well by the kids and the texts to be used for teaching, the greater the scaffolding and support that will be needed.

Administering oral reading fluency tests and informal reading inventories can provide a rough estimate of that gap. The further behind a student is in their reading performance, the more assistance that will be provided. This information might encourage a teacher to place certain kids closer at hand. Or it may be used to help decide which kids should have the opportunity to try reading the text aloud before taking it on for comprehension’s sake. The teacher might want to offer more word instruction before or after to these students or may direct more questions to them. Perhaps, teachers will go out of their way to select instructional texts that would be an especially good match to the background knowledge of the kids likely to be far behind.

Let’s be careful here. I’m not claiming that those tests will specifically identify missing or underdeveloped skills, except in the most general sense. The tests will give me a gross estimation of how fluently students can read this book and whether there will be a small or large overall gap between an easy text and the instructional text.

Some teachers may forego such testing. This would not be a tragedy. With a little care, they could launch into teaching the grade level texts and monitor closely how well the kids’ do with them. It doesn’t take long to figure out which kids need the most assistance.

Another possibility is to “test” their students using the texts they intend to teach. That was the original idea of the informal inventory. The teacher was supposed to determine how well kids could read the instructional text (not some other text that would predict performance with those teaching texts).

In any event, this information whether drawn from a published informal reading inventory or from a DIY version, can only offer a rough gauge of the lag that may exist. As you noted to me in your letter, the standard error of measurement of a typical IRI can be pretty big.

In the best cases, such a test may be accurate within plus or minus a half-grade level. For instance, if a test says the students’ level is second grade, that means that we can be 95 percent certain that their actual instructional level is somewhere between Grade 1 and Grade 3. Not a very helpful indicator.

Such test info may help you get started. However, keep your eyes open. Believe what you see day to day, rather than the initial test. If it says youngsters will struggle and they don’t or vice versa, adjust your plan.

I hope that clarifies what at first blush may sound inconsistent or contradictory.

However, like you, I’m not a big fan of running records. Research has not been especially kind to them.

I’d recommend against evaluating oral reading errors – the useless efforts to determine their orthographic-phonetic, syntactic, semantic accuracy. There will be serious reliability problems with such analyses – and there is no validated instructional protocol based on differentiated responses to these.

In terms of the reliability of running records, research has varied. Studies estimate at best you would need to have students read four passages per level to get a reliable estimate of reading performance – some studies indicated that it would take a whopping eight to ten passages per level (D’Agostino, et al., 2021; Fawson, et al., 2006). Which teachers would have the time to do that? Even if they did, there is no clear value to having done so.

Matt Burns and his colleagues have found similar problems with typical informal reading inventories, too (Burns, et al., 2015). I don’t disagree with that prescient evaluation, but I don’t see it as a death sentence.

There is reason to believe that lengthening these tests can make them much more reliable. For instance, if a teacher wants to check kids out on their reading textbook, it would be wise to have the students read for 3-minutes (Valencia, et al., 2010), calculating reading levels from that, rather than from the brief passages in most published IRIs.

Even better, I think, would be to have students read 3 different passages for a minute each. How many words students can read correctly per minute is a useful statistic, that when averaged can do a reasonably good job of predicting performance.

Another reason IRIs may struggle with reliability has to do with the inclusion of comprehension questions. Comprehension can be affected by how well that text reflects students’ knowledge. Two equivalent passages on different topics may provide very different reading level estimates. Interest in a topic – which correlates with knowledge, of course – can have a similar effect. There simply aren’t enough passages and questions in this kind of testing to provide a sufficient estimate of comprehension ability.

Nevertheless, I trust oral reading tests a lot more when the students know they will be questioned or expected to retell (my preference) what they have read. I don’t necessarily invest a lot of confidence in any scores that may result from their responses, but I do trust the oral reading performance more when students are reading for comprehension rather than trying to show me how fast they can read.

So… indeed, it is useful pedagogical information to have estimates of the distance between texts students can read well and those used for teaching.

The problem of leveled reading instruction isn’t the books (all books have some average level of difficulty – so, let’s not throw away the books that have level estimates), nor is it that kids don’t have levels (let’s face it, there are books one finds to be easy and those that are a slog). No, the problem is that we are taking these levels – both of students and texts – too serious and are using them to reduce opportunity to learn. Instead of staying focused on teaching kids how to read grade level books, we are limiting learning through wasteful testing regimes and misguided policing efforts aimed at protecting kids from challenging books.

It seems to me that informal oral reading tests can provide valuable insights to teachers as to how hard a text may be for students. Not to identify the specifics of what may need to be taught, nor to hold kids back to some mythical optimal level, but to help the teacher to monitor and guide to success all students.

References

Burns, M. K., Pulles, S. M., Maki, K., Kanive, R., Hodgson, J., Helman, L., McComas, J., & Preast, J. L. (2015). Accuracy of student performance while reading leveled books rated at their instructional level by a reading inventory. Journal of School Psychology, 53(6), 437-445. https://doi.org/10.1016/j.jsp.2015.09.003Je

D’Agostino, J. V., Rodgers, E. Winkler, Johnson, T., & Berenbon, R. (2021) The generalizability of running record accuracy and self-correction scores. Reading Psychology, 42(2), 111-130. https:/doi.org/10.1080/02702711.2021.1880177

Fawson, P. C., Ludlow, B. C., Reutzel, D. R., Sudweeks, R., & Smith, J. A. (2006). Examining the reliability of Running Records: Attaining generalizable results. Journal of Educational Research, 100(2), 113–126. https://doi.org/10.3200/JOER.100.2.113-126

Shanahan, T. (2025). Leveled reading, leveled lives. Cambridge, MA: Harvard Education Press.

Valencia, S. W., Smith, A. T., Reece, A. M., Li, M., & Wixson, K. K. (2010). Oral reading fluency assessment: issues of construct, criterion, and consequential validity. Reading Research Quarterly, 45(3), 270-291. https://doi.org/10.1598/RRQ.45.3.1

Comments

See what others have to say about this topic.

What Are your thoughts?

Leave me a comment and I would like to have a discussion with you!

Dee Nov 08, 2025 12:52 AM

I agree with everything you’ve said here, but I’m afraid that I must join in with the crowd in bringing up the title.
I recently directed some colleagues to this blog, and this was the first thing they asked about (not being familiar with the idea of this phrase in relation to a loaded question).
It makes sense, it is clever, but it’s definitely not a good look.

Gaynor Oct 11, 2025 08:57 AM

As an avowed antagonist of Dame Marie Clay, I find little to nothing of value in any of her ideas .Hence I am not surprised that her running records is yet another of her hair brained concepts to bite the dust. She ,as a progressive, was an iconoclast and hell-bent it seems on destroying anything that hinted of traditional education with its rationality and effectiveness.
Before her dominance we had in NZ a perfectly good method of assessing reading ages and estimating the difficulty of texts .The latter was based on the average frequency level of reading materials . Each noun was classified on a nine point scale according to a spelling list, taken from children's written material and the mean rating calculated to estimate difficulty levels. Since before Clay dominated, American basal reading materials were used in NZ , particularly the Ginn, ( Tom and Betty) , Mckee and Scott Foresman ( Dick and Jane ) basal readers The noun frequency readability rating was also equivalent to age levels of these American basal readers. Hence a link to spelling level, readability level and the equivalent age level. All theses were further linked to produced reading comprehension scores in national class tests called the Progressive Achievement Tests. Like all Clay's inventions all this synchronization has been messed up. For me she has wrecked havoc in one area after another of children's reading learning not the least her infernal leveled readers and running records. I have never used either.

Adam Becotte Oct 17, 2025 01:22 PM

Thank you for your thoughtful critique of leveled reading instruction. It deeply resonates with me, as I have seen students placed at a certain “level” and remain there indefinitely. This was not because they are incapable of more, but rather it is where they were told they belong. This culture of fixed leveling has contributed to a broader issue across subjects. Students who remain unchallenged, academically complacent, and increasingly confident that “just getting by” is good enough.

To shift this mindset, we need actionable steps that raise expectations while supporting teachers in providing appropriate challenges. That does not mean overwhelming students, but offering meaningful, grade-level work that connects to their background knowledge and pushes them to grow. When students experience success with challenging texts, they gain confidence, purpose, and a clearer sense of what they are actually capable of achieving. Rather than remaining inside a box and being told what they can achieve.

Ann Lewis Oct 17, 2025 05:34 PM

I have some concerns about the phrase “and No, I Don’t Beat My Wife Anymore.” While I understand it may be intended as irony or rhetorical critique, the language references domestic violence, which could be distressing or alienating for readers. It might overshadow your main message about Running Records or assessment practices.

Timothy Shanahan Oct 21, 2025 01:47 PM

This comment from Mat Brigham was posted for him by Timothy Shanahan:

Dear Tim,

Thank you for clarifying the difference between using reading levels to gate access to texts and using rough information about text difficulty to plan scaffolding. The goal of helping teachers anticipate support for grade-level reading feels like a worthy one.

Reading your post alongside the perspective of Matt Burns has helped me think more carefully about the balance between pedagogical intent and measurement quality. Burns argues that informal reading inventories and running records are too unstable to guide instruction. He points out that their passages do not reliably increase in difficulty, that miscues are coded inconsistently, and that the results are better replaced by more objective data such as oral reading fluency and direct comprehension measures, with decoding inventories when accuracy falls below about ninety-three per cent.

A related perspective comes from Devin Kearns, a researcher in reading development at the University of Connecticut. He recognises the same technical problems (Why Running Records Are Not Suitable for Progress Monitoring – https://m.youtube.com/watch?v=-fymJDViFVM): the lack of interval scaling, inconsistent passage difficulty, teacher selection of “easier” passages, and variability in how the Independent, Instructional and Frustration categories are applied. Even so, Kearns sees a narrow role for these tools. Because they are so common in schools, they can still offer limited diagnostic clues and act as confirmatory data when they align with stronger progress-monitoring results.

It is also striking how running records remain common in the United States yet play little or no role in England, where teachers rely on standardised reading and phonics assessments instead. In Australia they are still used in many schools, but debate about their subjectivity and weak theoretical base is growing. Even in New Zealand, where Marie Clay first developed the approach, some educators now question its continuing dominance. This suggests that running records may owe their endurance more to cultural habit than to clear evidence of necessity.

This means that teachers and schools should tread very cautiously when using running records. They should never be relied on as the sole basis for teaching decisions, and it is important to remain aware of their complications and the ease with which they can be misused, especially through the levelling systems that accompany them. The concept of an “instructional level” still lingers in many schools, and that makes it even easier for these assessments to reinforce outdated practices rather than inform effective instruction. For that reason, I think it is right that we approach running records with great caution, and continue to examine whether they genuinely help or merely constrain effective teaching.

Kind regards,

Mat Brigham

load all comments

Comments

Considering Running Records, and No, I Don’t Beat My Wife Anymore

32 comments

Considering Running Records, and No, I Don’t Beat My Wife Anymore

What We Talk About When We Talk About Reading Curriculum

What Teachers Need to Know about Sentence Comprehension Revisited

Literacy Charities for 2026

Comments

What Are your thoughts?

Considering Running Records, and No, I Don’t Beat My Wife Anymore

Subscribe to newsletter

Latest posts

What We Talk About When We Talk About Reading Curriculum

What Teachers Need to Know about Sentence Comprehension Revisited

Literacy Charities for 2026

Comments

What Are your thoughts?

One of the world’s premier literacy educators.