Considering Running Records, and No, I Don’t Beat My Wife Anymore

Blog Banner
11 October, 2025

informal reading inventories

Teacher question:

I’m confused. I've understood your message that we should stop obsessing about deriving an independent or instructional reading level. But I’ve also read that you feel that there is a place for F&P style running records (many schools still use them!). If a running record is producing a reading level, isn't this going against the idea that we should do away with the instructional level? Aren't there other problems with these tests, too (design problems, large standard errors, etc.)?

Shanahan responds:

You have it right, sort of. I thought my position on this was adroit and artful and you seem to find it to be abstruse and confusing. Hmmm… let’s take it step by step.

First, you are correct that I believe it is a big mistake to try to teach most kids reading at their supposed instructional level. The instructional level is meant to identify a text that will be maximally effective in teaching. You know the routine. The teacher is supposed to test students to identify this level by listening to their oral reading and asking questions. Different schemes have different criteria (90, 93, 95 percent word accuracy, 75 percent comprehension). No matter the criteria, placing kids in texts in this way does not boost their learning. There is, in fact, a growing body of evidence suggesting that it lessens learning.

Second, you are also correct that I claim it can be useful to collect this kind of information, and I admit that sounds at least a bit contradictory. Why measure the instructional level if you aren’t going to use that to choose the books you are going to teach?

My reasoning here has to do with what it means to NOT teach at the instructional level. I’ve come to believe that the instructional level is a device that results in both minimized learning and teaching. If kids are placed in texts that they can comprehend reasonably well, there isn’t much about reading that can be learned from those texts. The differences between their performance levels and the text demands are likely so small that the kids should be able to figure out what is unknown without much teacher help.

But what if the books are hard for the kids? That means they may not be able to accomplish high comprehension without some instruction and other teacher support. I think it can be helpful to a teacher to have some idea as to how much difficulty a text may pose for their students. The greater the gap between texts that can already be read reasonably well by the kids and the texts to be used for teaching, the greater the scaffolding and support that will be needed.  

Administering oral reading fluency tests and informal reading inventories can provide a rough estimate of that gap. The further behind a student is in their reading performance, the more assistance that will be provided. This information might encourage a teacher to place certain kids closer at hand. Or it may be used to help decide which kids should have the opportunity to try reading the text aloud before taking it on for comprehension’s sake. The teacher might want to offer more word instruction before or after to these students or may direct more questions to them. Perhaps, teachers will go out of their way to select instructional texts that would be an especially good match to the background knowledge of the kids likely to be far behind.

Let’s be careful here. I’m not claiming that those tests will specifically identify missing or underdeveloped skills, except in the most general sense. The tests will give me a gross estimation of how fluently students can read this book and whether there will be a small or large overall gap between an easy text and the instructional text.

Some teachers may forego such testing. This would not be a tragedy. With a little care, they could launch into teaching the grade level texts and monitor closely how well the kids’ do with them. It doesn’t take long to figure out which kids need the most assistance.

Another possibility is to “test” their students using the texts they intend to teach. That was the original idea of the informal inventory. The teacher was supposed to determine how well kids could read the instructional text (not some other text that would predict performance with those teaching texts).

In any event, this information whether drawn from a published informal reading inventory or from a DIY version, can only offer a rough gauge of the lag that may exist. As you noted to me in your letter, the standard error of measurement of a typical IRI can be pretty big.

In the best cases, such a test may be accurate within plus or minus a half-grade level. For instance, if a test says the students’ level is second grade, that means that we can be 95 percent certain that their actual instructional level is somewhere between Grade 1 and Grade 3. Not a very helpful indicator.

Such test info may help you get started. However, keep your eyes open. Believe what you see day to day, rather than the initial test. If it says youngsters will struggle and they don’t or vice versa, adjust your plan.

I hope that clarifies what at first blush may sound inconsistent or contradictory.

However, like you, I’m not a big fan of running records. Research has not been especially kind to them.

I’d recommend against evaluating oral reading errors – the useless efforts to determine their orthographic-phonetic, syntactic, semantic accuracy. There will be serious reliability problems with such analyses – and there is no validated instructional protocol based on differentiated responses to these.

In terms of the reliability of running records, research has varied. Studies estimate at best you would need to have students read four passages per level to get a reliable estimate of reading performance – some studies indicated that it would take a whopping eight to ten passages per level (D’Agostino, et al., 2021; Fawson, et al., 2006). Which teachers would have the time to do that? Even if they did, there is no clear value to having done so.

Matt Burns and his colleagues have found similar problems with typical informal reading inventories, too (Burns, et al., 2015). I don’t disagree with that prescient evaluation, but I don’t see it as a death sentence.

There is reason to believe that lengthening these tests can make them much more reliable. For instance, if a teacher wants to check kids out on their reading textbook, it would be wise to have the students read for 3-minutes (Valencia, et al., 2010), calculating reading levels from that, rather than from the brief passages in most published IRIs.

Even better, I think, would be to have students read 3 different passages for a minute each. How many words students can read correctly per minute is a useful statistic, that when averaged can do a reasonably good job of predicting performance.

Another reason IRIs may struggle with reliability has to do with the inclusion of comprehension questions. Comprehension can be affected by how well that text reflects students’ knowledge. Two equivalent passages on different topics may provide very different reading level estimates. Interest in a topic – which correlates with knowledge, of course – can have a similar effect. There simply aren’t enough passages and questions in this kind of testing to provide a sufficient estimate of comprehension ability.

Nevertheless, I trust oral reading tests a lot more when the students know they will be questioned or expected to retell (my preference) what they have read. I don’t necessarily invest a lot of confidence in any scores that may result from their responses, but I do trust the oral reading performance more when students are reading for comprehension rather than trying to show me how fast they can read.

So… indeed, it is useful pedagogical information to have estimates of the distance between texts students can read well and those used for teaching.

The problem of leveled reading instruction isn’t the books (all books have some average level of difficulty – so, let’s not throw away the books that have level estimates), nor is it that kids don’t have levels (let’s face it, there are books one finds to be easy and those that are a slog). No, the problem is that we are taking these levels – both of students and texts – too serious and are using them to reduce opportunity to learn. Instead of staying focused on teaching kids how to read grade level books, we are limiting learning through wasteful testing regimes and misguided policing efforts aimed at protecting kids from challenging books.

It seems to me that informal oral reading tests can provide valuable insights to teachers as to how hard a text may be for students. Not to identify the specifics of what may need to be taught, nor to hold kids back to some mythical optimal level, but to help the teacher to monitor and guide to success all students.

References

Burns, M. K., Pulles, S. M., Maki, K., Kanive, R., Hodgson, J., Helman, L., McComas, J., & Preast, J. L. (2015). Accuracy of student performance while reading leveled books rated at their instructional level by a reading inventory. Journal of School Psychology, 53(6), 437-445. https://doi.org/10.1016/j.jsp.2015.09.003Je

D’Agostino, J. V., Rodgers, E. Winkler, Johnson, T., & Berenbon, R. (2021) The generalizability of running record accuracy and self-correction scores. Reading Psychology, 42(2), 111-130. https:/doi.org/10.1080/02702711.2021.1880177

Fawson, P. C., Ludlow, B. C., Reutzel, D. R., Sudweeks, R., & Smith, J. A. (2006). Examining the reliability of Running Records: Attaining generalizable results. Journal of Educational Research, 100(2), 113–126. https://doi.org/10.3200/JOER.100.2.113-126

Shanahan, T. (2025). Leveled reading, leveled lives. Cambridge, MA: Harvard Education Press.

Valencia, S. W., Smith, A. T., Reece, A. M., Li, M., & Wixson, K. K. (2010). Oral reading fluency assessment: issues of construct, criterion, and consequential validity. Reading Research Quarterly, 45(3), 270-291. https://doi.org/10.1598/RRQ.45.3.1

Comments

See what others have to say about this topic.

What Are your thoughts?

Leave me a comment and I would like to have a discussion with you!

Comment *
Name*
Email*
Website
Dee Nov 08, 2025 12:52 AM

I agree with everything you’ve said here, but I’m afraid that I must join in with the crowd in bringing up the title.
I recently directed some colleagues to this blog, and this was the first thing they asked about (not being familiar with the idea of this phrase in relation to a loaded question).
It makes sense, it is clever, but it’s definitely not a good look.

Gaynor Oct 11, 2025 08:57 AM

As an avowed antagonist of Dame Marie Clay, I find little to nothing of value in any of her ideas .Hence I am not surprised that her running records is yet another of her hair brained concepts to bite the dust. She ,as a progressive, was an iconoclast and hell-bent it seems on destroying anything that hinted of traditional education with its rationality and effectiveness.
Before her dominance we had in NZ a perfectly good method of assessing reading ages and estimating the difficulty of texts .The latter was based on the average frequency level of reading materials . Each noun was classified on a nine point scale according to a spelling list, taken from children's written material and the mean rating calculated to estimate difficulty levels. Since before Clay dominated, American basal reading materials were used in NZ , particularly the Ginn, ( Tom and Betty) , Mckee and Scott Foresman ( Dick and Jane ) basal readers The noun frequency readability rating was also equivalent to age levels of these American basal readers. Hence a link to spelling level, readability level and the equivalent age level. All theses were further linked to produced reading comprehension scores in national class tests called the Progressive Achievement Tests. Like all Clay's inventions all this synchronization has been messed up. For me she has wrecked havoc in one area after another of children's reading learning not the least her infernal leveled readers and running records. I have never used either.

Adam Becotte Oct 17, 2025 01:22 PM

Thank you for your thoughtful critique of leveled reading instruction. It deeply resonates with me, as I have seen students placed at a certain “level” and remain there indefinitely. This was not because they are incapable of more, but rather it is where they were told they belong. This culture of fixed leveling has contributed to a broader issue across subjects. Students who remain unchallenged, academically complacent, and increasingly confident that “just getting by” is good enough.

To shift this mindset, we need actionable steps that raise expectations while supporting teachers in providing appropriate challenges. That does not mean overwhelming students, but offering meaningful, grade-level work that connects to their background knowledge and pushes them to grow. When students experience success with challenging texts, they gain confidence, purpose, and a clearer sense of what they are actually capable of achieving. Rather than remaining inside a box and being told what they can achieve.

Ann Lewis Oct 17, 2025 05:34 PM

I have some concerns about the phrase “and No, I Don’t Beat My Wife Anymore.” While I understand it may be intended as irony or rhetorical critique, the language references domestic violence, which could be distressing or alienating for readers. It might overshadow your main message about Running Records or assessment practices.

AF Oct 17, 2025 05:45 PM

Noting student reading errors helps me understand what students know. I can often see patterns in errors such as suffix confusion or omission, replacement of function words with other function words, grapheme/ word sequencing problems, tracking issues, etc. I think it would be helpful to define what you mean by "running records", as people have different ideas of what these can be. If you're talking about calculating the number of MSV errors and making a judgement about that, I agree that this is a waste of time. However, if someone is considering recording reading errors and analyzing them to inform instruction, I think this is a valuable process - I've found it to be. Thoughts?

AF Oct 17, 2025 05:46 PM

Thank you Ann. I was planning on adding my thoughts on this as well. I strongly feel that this title must be changed.

Timothy Shanahan Oct 17, 2025 05:49 PM

AF-
What the research says about that approach is that it can be pretty misleading unless you have enough errors to analyze to make the analysis reliable. About 50 errors seems to be necessary, which isn't very practical for classroom use. That is why miscue analysis requires challenging passages of about 500 words in length -- that way they can guarantee enough mistakes to allow for consideration of the source of the errors.

tim

Dr. Bill Conrad Oct 17, 2025 06:13 PM

The epigram. genius is 10% inspiration and 90% perspiration might be modified within this reading context to Comprehension is 10% inspiration and 90% perspiration!

Molly Oct 17, 2025 06:26 PM

I don’t understand why you thought today’s title was okay. Maybe the hackers got in there again?

TR Oct 17, 2025 06:44 PM

I almost didn't read this because of the phrase “and No, I Don’t Beat My Wife Anymore.” As a survivor of domestic violence and it's enduring trauma, I am appalled that this would ever be considered to be an appropriate phrase to use in the world of education. Teachers are people too, with lives and experiences outside of their training and jobs. While reading is a right and oh so very important, personal safety trumps all.

Timothy Shanahan Oct 17, 2025 06:50 PM

TR--
I feel bad that you (or anyone is ever a victim of domestic violence -- my mother was as well, so I'm definitely not a fan). However, the term I used has been used in the fields of philosophy and logic for over a century to refer to loaded or fallacious questions.

tim

Terri Ruyter Oct 17, 2025 07:00 PM

You can also note patterns of difficulty when having a student read and discuss a passage with you. Maybe it is less of a running record and more of a reading conference -- using an informal running record to get started. I recently assessed a student who teachers all felt could read, yet his reading scores were significantly below grade level. When he read aloud to me, he had great phrasing and fluency. He could not tell me a thing about the text. The trick was, he had no idea who the characters were. It was a first person narrator and the pronouns and the antecedents were really tripping him up. We used that bit of information, to teach him about pronouns and how to track who is speaking and who the characters are. So the miscue analysis can be at the word level, but you can also use the data to understand syntactic/semantic miscues that will be hidden unless you dig for them.

Dr. Bill Conrad Oct 17, 2025 07:12 PM

It looks like the running record may be running out of gas!

AC Oct 17, 2025 07:19 PM

Dr. Shanahan I respect you, your knowledge, and your contribution to my growth as an educator. I call you my virtual mentor. I must let you know that this title is triggering. I hope you might be able to reflect on how a title like this connects to the lived experiences of some of your readers and how your words might be causing harm, as it is egregiously traumatizing to many. It can awaken trauma that many have fought hard to overcome and some are presently battling with. It would be prudent to be more sensitve.
Thank you

Timothy Shanahan Oct 17, 2025 07:35 PM

AC-
That notion of "triggering" is a troubling one. Everyone has horrible memories of something -- a death in the family, an accident, etc. If you mention "school", someone thinks of a school shooting. If you say the word, "crash," another person is reduced to tears because of the memory of a tragic auto accident. I once wrote a piece in which I referred to alcoholism, and received numerous complaints from individuals who knew alcoholics.
It is essential that human beings living in a civilization learn to distinguish words from things -- and recognize that we each need to control our own tendencies to react rather than trying to control other people's use of the language.

tim

Dana Siegel Oct 17, 2025 08:09 PM

Hi Tim,
I am surprised that you would defend your choice of a title for this article. Couldn't you simply title this article "Considering Running Records"? No need to add the negativity of life outside the classroom. I believe you would better honor your Mom by sharing her strengths and not publicizing her challenging circumstances- not that she was at fault in any way. As my parents would say, "Just because someone else does something, you don't have to do so as well. Use your brain and heart before following someone."

Lauren Oct 17, 2025 09:21 PM

I think that some type of leveling system is important for emergent readers, early readers, and even the beginning side of transitional readers. We have Lexile numbers, AR numbers, F&P letters, and the NZ system which was mentioned by a previous contributor. It would be nice to have a standard way of measuring levels, although I don't think it will ever be a perfect science.

Timothy Shanahan Oct 17, 2025 09:33 PM

Lauren-
Indeed, beginning reading is a somewhat different animal. We need to be careful at those levels to avoid too much difficulty. That's why decodability and controlled vocabulary are so important with beginning texts. Any leveling system will be imperfect, of course, but making sure that the beginning books provide sufficient support for the development of the foundations of decoding are essential.

tim

Dr. Bill Corad Oct 17, 2025 09:36 PM

As an educational professional, i must recommend that you change the title of this blog. It would be a shame that a blog filled with great ideas and wisdom be tainted by a title that treats domestic violence with a wink and a nod.

Please change the title.

Sandy Backlund Oct 18, 2025 12:46 AM

And not one mention of Marie Clay to clarify this teacher's obvious confusion about the reasons for a running record ? It can, yes, determine instructional level based on miscues, but more importantly, a trained practitioner can analyze a number of running records to create an instructional plan that uses evidence of what the struggling reader does ignore, misuse or overly rely upon that may be hindering his ability to become a fluent reader. It is the most elegant diagnostic teaching tool I have used to successfully give scores of children children the tools they need to read whatever they choose. A detailed analysis of the child's running records hands the teacher a clear picture of what to teach and when to teach it and to then get out of the way. It's more than a placement tool. There. I feel better now.

Mac Oct 18, 2025 12:48 AM

What is the intention behind the title? I can’t imagine it’s meant to be clickbait, but it serves no worthy purpose.

Anonymous Oct 18, 2025 02:59 AM

It comes from "The Power of Logic" by C. Stephen Layman. "Do you still beat your wife?" is an example of a loaded question that you can not answer in a positive way for yourself no matter if you answer yes, or no. You indict yourself either way. It really is a very classic example of this concept of the loaded question, that is used quite frequently in academia. So, if you're familiar with the history of this phrase, you instantly know that the person is talking about a question or situation that is unwinnable. If you are not familiar, I could see where it would seem puzzling and maybe offensive. I give Dr. Shanahan a pass on this. I might pass on using the phrase myself... as it is not widely understood, and might hurt feelings...

xiaofang Oct 18, 2025 07:20 AM

Shanahan suggests that a useful alternative is to "test" students using the actual texts intended for instruction. In a classroom with diverse learners, what would be a practical and time-efficient protocol for a teacher to do this for an entire class, and how could they systematically use the data to form flexible support groups?

Dr. Bill Conrad Oct 18, 2025 12:34 PM

Hello Tim,

You are correct that the statement,No, I don’t beat my wife anymore” is frequently used as an example of a response to a loaded question. Damned if you do! Damned if you don’t.

However, it is rarely used in popular communication any more due to its insensitivity to the seriousness of domestic violence.

May I suggest one of the following alternatives?

“Have you stopped cheating on tests?”
“Have you stopped lying to your friends?”
“Do you still steal office supplies?”

Wil D Oct 18, 2025 01:00 PM

Really inappropriate Title! Please change it. It distracts from your usual thoughtful posts !

Timothy Shanahan Oct 18, 2025 02:32 PM

Sandy--
You state that running records can determine a child's instructional level. The research shows this not to be the case. These questionable instructional plans that you describe could be the reason why in the long run Reading Recovery turns out to do more harm than good to kids. (I'm not sure how mentioning Marie Clay would contribute to explaining or excusing either of these basic problems).

tim

Leslie Oct 18, 2025 03:18 PM

I am curious. As a 5th grade teacher of reading, to 79 students, with approximately 45 minutes to teach each class, how should I go about teaching reading? The expectation is still Lucy Calkins Units of Study Reader's Workshop with students being pulled out of the classroom during the "independent work time" how I should manage the classes and expose them to 5th grade reading materials. Thoughts?

Timothy Shanahan Oct 18, 2025 08:14 PM

Xiafang--

I wouldn't use such testing to form groups at all. Finding out how well students would be able to read passages from the textbook that you are using would allow you to know which kids are likely to need a lot of help with taking on that text.

tim

Timothy Shanahan Oct 18, 2025 08:18 PM

Leslie--
Given only 45 minutes a day for reading and writing instruction, I would spend approximately 55 minutes each week teaching words (decoding of multi-syllable words, spelling, vocabulary, morphology), 55 minutes working on fluency development (accuracy, automaticity, prosody), 55 minutes on reading comprehension (language, knowledge, strategies), and 55 minutes on writing (transcription, composition).
tim

Timothy Shanahan Oct 21, 2025 01:47 PM

This comment from Mat Brigham was posted for him by Timothy Shanahan:

Dear Tim,

Thank you for clarifying the difference between using reading levels to gate access to texts and using rough information about text difficulty to plan scaffolding. The goal of helping teachers anticipate support for grade-level reading feels like a worthy one.

Reading your post alongside the perspective of Matt Burns has helped me think more carefully about the balance between pedagogical intent and measurement quality. Burns argues that informal reading inventories and running records are too unstable to guide instruction. He points out that their passages do not reliably increase in difficulty, that miscues are coded inconsistently, and that the results are better replaced by more objective data such as oral reading fluency and direct comprehension measures, with decoding inventories when accuracy falls below about ninety-three per cent.

A related perspective comes from Devin Kearns, a researcher in reading development at the University of Connecticut. He recognises the same technical problems (Why Running Records Are Not Suitable for Progress Monitoring – https://m.youtube.com/watch?v=-fymJDViFVM): the lack of interval scaling, inconsistent passage difficulty, teacher selection of “easier” passages, and variability in how the Independent, Instructional and Frustration categories are applied. Even so, Kearns sees a narrow role for these tools. Because they are so common in schools, they can still offer limited diagnostic clues and act as confirmatory data when they align with stronger progress-monitoring results.

It is also striking how running records remain common in the United States yet play little or no role in England, where teachers rely on standardised reading and phonics assessments instead. In Australia they are still used in many schools, but debate about their subjectivity and weak theoretical base is growing. Even in New Zealand, where Marie Clay first developed the approach, some educators now question its continuing dominance. This suggests that running records may owe their endurance more to cultural habit than to clear evidence of necessity.

This means that teachers and schools should tread very cautiously when using running records. They should never be relied on as the sole basis for teaching decisions, and it is important to remain aware of their complications and the ease with which they can be misused, especially through the levelling systems that accompany them. The concept of an “instructional level” still lingers in many schools, and that makes it even easier for these assessments to reinforce outdated practices rather than inform effective instruction. For that reason, I think it is right that we approach running records with great caution, and continue to examine whether they genuinely help or merely constrain effective teaching.

Kind regards,

Mat Brigham

Kristen Dart Oct 21, 2025 06:50 PM

Do you agree that some children will struggle with grade level text to the extent that to scaffold to their level would require too much from the teacher? By that I mean taking time, planning, and resources well beyond what is reasonable to ensure they understand the text. We have students with scaffolds each week that include vocabulary, building background knowledge with Guided Language Acquisition Design, questions using Questioning the Author, repeated reading with Fluency Oriented Reading Instruction, and text cohesion prompting and support. Even so these students who are progress monitored 2-4 grade levels behind struggle to understand the text, in fact some of them cannot even track the text. What do we do when the grade level text is so far beyond their reach?

Timothy Shanahan Oct 22, 2025 04:06 PM

Kristen--

Indeed, an instructional situation may be so complicated that a teacher may not be able to scaffold sufficiently or well at least for some students. This can be the result of a student being so far behind that it would require too much individual time. It can also be the product of too little instructional time, a teacher's struggles with classroom management and control, etc. While there is no distance between child and book that, theoretically, cannot be profitably scaffolded, there are practical distances that would be beyond reasonable expectations.

I do wonder from the list you provided how targeted your scaffolding is? It sounds like you do all these things, but I can't tell how targeted they are on real problems the kids are having with the text. These sound very much like a standard package of guided reading scaffolds that everyone gets with every text no matter what. That isn't likely to ensure that most kids succeed. (Also, it isn't clear whether these kids all have basic decoding skills -- can they decode as well as an average end of year first-grader or beginning of year second-grader. If not, using especially challenging texts would be contra-indicated).

tim

Comments

Considering Running Records, and No, I Don’t Beat My Wife Anymore

32 comments

One of the world’s premier literacy educators.

He studies reading and writing across all ages and abilities. Feel free to contact him.

Timothy Shanahan is one of the world’s premier literacy educators. He studies the teaching of reading and writing across all ages and abilities. He was inducted to the Reading Hall of Fame in 2007, and is a former first-grade teacher.  Read more

60 E Monroe St #6001
CHICAGO, Illinois 60603-2760
Subscribe