Teacher question:
I’m confused. I've understood your message that we should stop obsessing about deriving an independent or instructional reading level. But I’ve also read that you feel that there is a place for F&P style running records (many schools still use them!). If a running record is producing a reading level, isn't this going against the idea that we should do away with the instructional level? Aren't there other problems with these tests, too (design problems, large standard errors, etc.)?
Shanahan responds:
You have it right, sort of. I thought my position on this was adroit and artful and you seem to find it to be abstruse and confusing. Hmmm… let’s take it step by step.
First, you are correct that I believe it is a big mistake to try to teach most kids reading at their supposed instructional level. The instructional level is meant to identify a text that will be maximally effective in teaching. You know the routine. The teacher is supposed to test students to identify this level by listening to their oral reading and asking questions. Different schemes have different criteria (90, 93, 95 percent word accuracy, 75 percent comprehension). No matter the criteria, placing kids in texts in this way does not boost their learning. There is, in fact, a growing body of evidence suggesting that it lessens learning.
Second, you are also correct that I claim it can be useful to collect this kind of information, and I admit that sounds at least a bit contradictory. Why measure the instructional level if you aren’t going to use that to choose the books you are going to teach?
My reasoning here has to do with what it means to NOT teach at the instructional level. I’ve come to believe that the instructional level is a device that results in both minimized learning and teaching. If kids are placed in texts that they can comprehend reasonably well, there isn’t much about reading that can be learned from those texts. The differences between their performance levels and the text demands are likely so small that the kids should be able to figure out what is unknown without much teacher help.
But what if the books are hard for the kids? That means they may not be able to accomplish high comprehension without some instruction and other teacher support. I think it can be helpful to a teacher to have some idea as to how much difficulty a text may pose for their students. The greater the gap between texts that can already be read reasonably well by the kids and the texts to be used for teaching, the greater the scaffolding and support that will be needed.
Administering oral reading fluency tests and informal reading inventories can provide a rough estimate of that gap. The further behind a student is in their reading performance, the more assistance that will be provided. This information might encourage a teacher to place certain kids closer at hand. Or it may be used to help decide which kids should have the opportunity to try reading the text aloud before taking it on for comprehension’s sake. The teacher might want to offer more word instruction before or after to these students or may direct more questions to them. Perhaps, teachers will go out of their way to select instructional texts that would be an especially good match to the background knowledge of the kids likely to be far behind.
Let’s be careful here. I’m not claiming that those tests will specifically identify missing or underdeveloped skills, except in the most general sense. The tests will give me a gross estimation of how fluently students can read this book and whether there will be a small or large overall gap between an easy text and the instructional text.
Some teachers may forego such testing. This would not be a tragedy. With a little care, they could launch into teaching the grade level texts and monitor closely how well the kids’ do with them. It doesn’t take long to figure out which kids need the most assistance.
Another possibility is to “test” their students using the texts they intend to teach. That was the original idea of the informal inventory. The teacher was supposed to determine how well kids could read the instructional text (not some other text that would predict performance with those teaching texts).
In any event, this information whether drawn from a published informal reading inventory or from a DIY version, can only offer a rough gauge of the lag that may exist. As you noted to me in your letter, the standard error of measurement of a typical IRI can be pretty big.
In the best cases, such a test may be accurate within plus or minus a half-grade level. For instance, if a test says the students’ level is second grade, that means that we can be 95 percent certain that their actual instructional level is somewhere between Grade 1 and Grade 3. Not a very helpful indicator.
Such test info may help you get started. However, keep your eyes open. Believe what you see day to day, rather than the initial test. If it says youngsters will struggle and they don’t or vice versa, adjust your plan.
I hope that clarifies what at first blush may sound inconsistent or contradictory.
However, like you, I’m not a big fan of running records. Research has not been especially kind to them.
I’d recommend against evaluating oral reading errors – the useless efforts to determine their orthographic-phonetic, syntactic, semantic accuracy. There will be serious reliability problems with such analyses – and there is no validated instructional protocol based on differentiated responses to these.
In terms of the reliability of running records, research has varied. Studies estimate at best you would need to have students read four passages per level to get a reliable estimate of reading performance – some studies indicated that it would take a whopping eight to ten passages per level (D’Agostino, et al., 2021; Fawson, et al., 2006). Which teachers would have the time to do that? Even if they did, there is no clear value to having done so.
Matt Burns and his colleagues have found similar problems with typical informal reading inventories, too (Burns, et al., 2015). I don’t disagree with that prescient evaluation, but I don’t see it as a death sentence.
There is reason to believe that lengthening these tests can make them much more reliable. For instance, if a teacher wants to check kids out on their reading textbook, it would be wise to have the students read for 3-minutes (Valencia, et al., 2010), calculating reading levels from that, rather than from the brief passages in most published IRIs.
Even better, I think, would be to have students read 3 different passages for a minute each. How many words students can read correctly per minute is a useful statistic, that when averaged can do a reasonably good job of predicting performance.
Another reason IRIs may struggle with reliability has to do with the inclusion of comprehension questions. Comprehension can be affected by how well that text reflects students’ knowledge. Two equivalent passages on different topics may provide very different reading level estimates. Interest in a topic – which correlates with knowledge, of course – can have a similar effect. There simply aren’t enough passages and questions in this kind of testing to provide a sufficient estimate of comprehension ability.
Nevertheless, I trust oral reading tests a lot more when the students know they will be questioned or expected to retell (my preference) what they have read. I don’t necessarily invest a lot of confidence in any scores that may result from their responses, but I do trust the oral reading performance more when students are reading for comprehension rather than trying to show me how fast they can read.
So… indeed, it is useful pedagogical information to have estimates of the distance between texts students can read well and those used for teaching.
The problem of leveled reading instruction isn’t the books (all books have some average level of difficulty – so, let’s not throw away the books that have level estimates), nor is it that kids don’t have levels (let’s face it, there are books one finds to be easy and those that are a slog). No, the problem is that we are taking these levels – both of students and texts – too serious and are using them to reduce opportunity to learn. Instead of staying focused on teaching kids how to read grade level books, we are limiting learning through wasteful testing regimes and misguided policing efforts aimed at protecting kids from challenging books.
It seems to me that informal oral reading tests can provide valuable insights to teachers as to how hard a text may be for students. Not to identify the specifics of what may need to be taught, nor to hold kids back to some mythical optimal level, but to help the teacher to monitor and guide to success all students.
References
Burns, M. K., Pulles, S. M., Maki, K., Kanive, R., Hodgson, J., Helman, L., McComas, J., & Preast, J. L. (2015). Accuracy of student performance while reading leveled books rated at their instructional level by a reading inventory. Journal of School Psychology, 53(6), 437-445. https://doi.org/10.1016/j.jsp.2015.09.003Je
D’Agostino, J. V., Rodgers, E. Winkler, Johnson, T., & Berenbon, R. (2021) The generalizability of running record accuracy and self-correction scores. Reading Psychology, 42(2), 111-130. https:/doi.org/10.1080/02702711.2021.1880177
Fawson, P. C., Ludlow, B. C., Reutzel, D. R., Sudweeks, R., & Smith, J. A. (2006). Examining the reliability of Running Records: Attaining generalizable results. Journal of Educational Research, 100(2), 113–126. https://doi.org/10.3200/JOER.100.2.113-126
Shanahan, T. (2025). Leveled reading, leveled lives. Cambridge, MA: Harvard Education Press.
Valencia, S. W., Smith, A. T., Reece, A. M., Li, M., & Wixson, K. K. (2010). Oral reading fluency assessment: issues of construct, criterion, and consequential validity. Reading Research Quarterly, 45(3), 270-291. https://doi.org/10.1598/RRQ.45.3.1
Comments
See what others have to say about this topic.