Showing posts with label Readability. Show all posts
Showing posts with label Readability. Show all posts

Friday, December 27, 2013

How Publishers Can Screw Up the Common Core

Lexiles and other readability measures are criticized these days about as much as Congress. But unlike Congress they don’t deserve it.

Everyone knows Grapes of Wrath is harder to read than predicted. But for every book with a hinky readability score many others are placed just right.

These formulas certainly are not perfect, but they are easy to use and they make more accurate guesses than we can without them.

So what’s the problem?

Readability measures do a great job of predicting reading comprehension, but they provide lousy writing guidance.

Let’s say that you have a text that comes out harder than you’d hoped. You wanted it for fourth-grade, but the Lexiles say it’s better for grade 5.

Easy to fix, right? Just divide a few sentences in two to reduce average sentence length, and swap out a few of the harder words for easier synonyms, and voila, the Lexiles will be just what you’d hoped for.

But research shows this kind of mechanical “adjusting” doesn’t actually change the difficulty of the text (though it does mess up the accuracy of the readability rating). This kind of “fix” won’t make the text easier for your fourth-graders, but the grade that you put on the book will be just right. Would you rather feel good or look good?

With all of the new emphasis on readability levels in Common Core, I fear that test and textbook publishers are going to make sure that their measurements are terrific, even if their texts are not.

What should happen when a text turns out to be harder or easier than intended, is that the material should be assigned to another grade level or it should really be revised. Real revisions make more than make mechanical adjustments. Such rewrites engage in the author in trying to improve the text’s clarity.

Such fixes aren’t likely to happen much with reading textbooks, because they tend to be anthologies of texts already published elsewhere. E.B. White and Roald Dahl won’t be approving revisions of their stuff anytime soon, nor will many of the living and breathing authors whose books are anthologized.

But instructional materials and assessment passages that are written—not selected—specifically to teach or test literacy skills are another thing altogether. Don’t be surprised if many of those kinds of materials turn out to be harder or easier than you thought they’d be.

There is no sure way to protect against fitting texts to readability formulas. Sometimes mechanical revisions are pretty choppy, and you might catch that. But generally you can’t tell if a text has been manipulated to come out right. The publishers themselves may not know, since such texts are often written to spec by independent contractors.

Readability formulas are a valuable tool in text selection texts, but they only index text difficulty, they don’t actually measure it (that is, they do not reveal why a text may be hard to understand). Qualitative review of texts and continuous monitoring how well students do with texts in the classroom are important tools for keeping the publishing companies honest on this one. Buyer beware.

Monday, September 10, 2012

CCSS Allows More than Lexiles

When I was working on my doctorate, I had to conduct a historical study for one of my classes. I went to the Library of Congress and calculated readabilities for books that had been used to teach reading in the U.S. (or in the colonies that became the U.S.). I started with the Protestant Tutor and the New England Primer, the first books used for reading instruction here. From there I examined Webster’s Blue-Backed Speller and its related volumes and the early editions of McGuffey’s Readers.

Though the authors of those have left no record of how those books were created, it is evident that they had sound intuitions as to what makes text challenging. Even in the relatively brief single volume Tutor and Primer, the materials got progressively more difficult from beginning to end. These earliest books ramped up in difficulty very quickly (you read the alphabet on one page, simple syllables on the next, which was followed by a relatively easy read, but then challenge levels would jump markedly).

By the time we get to the three-volume Webster, the readability levels adjust more slowly from book to book with the speller (the first volume) being by far the easiest, and the final book (packed with political speeches and the like) being all but unreadable (kind of like political speeches today).

By the 1920s, psychologists began searching for measurement tools that would allow them to describe the readability or comprehensibility of texts. In other words, they wanted to turn these intelligent intuitions about text difficulty into tools that anyone could use. That work has proceeded by fits and starts over the past century, and has resulted in the development of a plethora of readability measurements.

Readability research has usually focused on the reading comprehension outcome. Thus, they have readers do something with a bunch of texts (e.g., answer questions, do maze/cloze tasks) and then they try to predict these performance levels by counting easy to measure characteristics of the texts (words and sentences). The idea is to use easily measured or counted text features and to then place the texts on a scale from easy to hard that agrees with how readers did with the texts.

Educators stretched this idea of readability to one of learnability. Instead of trying to predict how well readers would understand a text, educators wanted to use readability to predict how well students would learn from such texts. Thus, the idea of “instructional level”: if you teach students with books that appropriately matched their reading levels, the idea was that students would learn more. If you placed them in materials that were relatively easier or harder, there would be less learning. This theory has not held up very well when empirically tested. Students seem to be able to learn from a pretty wide range of text difficulties, depending on the amount of teacher support.

The Common Core State Standards (CCSS) did not buy into the instructional level idea. Instead of accepting the claim that students needed to be taught at “their levels,” the CCSS recognizes that students will never reach the needed levels by the end of high school unless harder texts were used for teaching; not only harder in terms of students’ instructional levels, but harder also in terms of which texts are assigned to which grade levels. Thus, for Grades 2-12, CCSS assigned higher Lexile levels to each grade than in the past (the so-called stretch bands).

Lexiles is a recent schemes for measuring readability. Initially, it was the only readability measure accepted by the Common Core. That is no longer the case. CCSS now provides text guidance for how to match books to grade level using several formulas. This change does not take us back to using easier texts for each grade level. Nor does it back down from encouraging teachers to work with students at levels higher than their so-called instructional levels. It does mean that it will be easier for schools to identify appropriate texts using and of six different approaches—many of which are already widely used by schools.

Of course, there are many other schemes that could have been included by CCSS (there are at least a couple of hundred readability formulas). Why aren’t they included? Will they be going forward?

From looking at what was included, it appears to me that CCSS omitted two kinds of measures. First, they omitted those schemes that have not used often (few publishers still use Dale-Chall or the Fry Graph to specify text difficulties, so there would be little benefit in connecting them to the CCSS plan). Second, they omitted widely used measures that were not derived from empirical study (Reading Recovery levels, Fountas & Pinnell levels, etc.). Such levels are not necessarily wrong—remember educators have intuitively identified text challenge levels for hundreds of years.

These schemes are especially interesting for the earliest reading levels (CCSS provides no guidance for K and 1). For the time being, it makes sense to continue to use such approaches for sorting out the difficulty of beginning reading texts, but then to switch to approaches that have been tested empirically in grades 2 through 12. [There is very interesting research underway on beginning reading texts involving Freddie Hiebert and the Lexile people. Perhaps in the not-too-distant future we will have stronger sources of information on beginning texts].    

Here is the new chart for identifying text difficulties for different grade levels:

Core Band

Degrees of
2nd 3rd
2.75 5.14
42 54
4th 5th
4.97 7.03
52 60
6th 8th
7.00 9.98
57 67
9th 10th
9.67 12.01
62 72
11th CCR
11.20 14.10
67 74

Core Band


The Lexile
2nd 3rd
1.98 5.34
420 820
4th 5th
4.51 7.73
740 1010
6th 8th
6.51 10.34
925 1185
9th 10th
8.32 12.12
1050 1335
11th CCR
10.34 14.2
1185 1385

Core Band


2nd 3rd
3.53 6.13
0.05 2.48
4th 5th
5.42 7.92
0.84 5.75
6th 8th
7.04 9.57
4.11 10.66
9th 10th
8.41 10.81
9.02 13.93
11th CCR
9.57 12.00
12.30 14.50

For more information:

Monday, September 28, 2009

Putting Students into Books for Instruction

This weekend, there was a flurry of discussion on the National Reading Conference listserv about how to place students in books for reading instruction. This idea goes back to Emmet Betts in 1946. Despite a long history, there hasn’t been a great deal of research into the issue, so there are lots of opinions and insights. I tend to lurk on these listservs rather than participating, but this one really intrigued me as it explored a lot of important ideas. Here are a few.

Which ways of indicating book difficulty work best?
This question came up because the inquirer wondered if it mattered whether she used Lexiles, Reading Recovery, or Fountas and Pinnell levels. The various responses suggested a whiff of bias against Lexiles (or, actually, against traditional measures of readability including Lexiles).

So are all the measures of book difficulty the same? Well, they are and they’re not. It is certainly true that historically most measures of readability (including Lexiles) come down to two measurements: word difficulty measure and sentence difficulty. These factors are weighted and combined to predict some criterion. Although Lexiles include the same components as traditional readability formulas, they predict different criteria. Lexiles are lined up with an extensive database of test performance, while most previous formulas predict the levels of subjectively sequenced passages. Also, Lexiles have been more recently normed. One person pointed out that Lexiles and other traditional measures of readability tend to come out the same (correlations of .77), which I think is correct, but because of the use of recent student reading as the criterion, I usually go with the Lexiles if there is much difference in an estimate.

Over the years, researchers have challenged readability because it is such a gross index of difficulty (obviously there is more to difficulty than sentences and words), but theoretically sound descriptions of text difficulty (such as those of Walter Kintsch and Arthur Graesser) haven’t led to appreciably better text difficulty estimates. Readability usually explains about 50% of the variation in text difficulty, and these more thorough and cumbersome measures don’t do much better.

One does see a lot of Fountas and Pinnell and Reading Recovery levels these days. Readability estimates are usually only accurate within about a year, and that is not precise enough to help a first-grade teacher to match her kids with books. So these schemes claim to make finer distinctions in text difficulty early on, but these levels of accuracy are open to question (I only know of one study of this and it was moderately positive), and there is no evidence that using such fine levels of distinction actually matter in student learning (there is some evidence of this with more traditional measures of readability).

If anything, I think these new schemes tend to put kids into too many levels and more than necessary. They probably correlate reasonably well with readability estimates, and their finer-grained results probably are useful for early first grade, but I’d hard pressed to say they are better than Lexiles or other readability formulas even at these levels (and they probably lead to over grouping).

Why does readability work so poorly for this?
I’m not sure that it really does work poorly despite the bias evident in the discussion. If you buy the notion that reading comprehension is a product of the interaction between the reader and the text (as most reading scholars do), why would you expect text measures to measure much more than half the variance in comprehension? In the early days of readability formula design, lots of text measures were used, but those fell away as it became apparent that they were redundant and 2-3 measures would be sufficient. The rest of the variation is variation in children’s interests and knowledge of topics and the like (and in our ability to measure student reading levels).

Is the right level the one that students will comprehend best at?
One of the listserv participants wrote that the only point to all of this leveling was to get students into texts that they could understand. I think that is a mistake. Often that may be the reason for using readability, but that isn’t what teachers need to do necessarily. What a teacher wants to know is “at what level will a child make optimum learning gains in my class?” If the child will learn better from something hard to comprehend, then, of course, we’d rather have them in that book.

The studies on this are interesting in that they suggest that sometimes you want students practicing with challenging text that may seem too hard (like during oral reading fluency practice) and other times you want them practicing with materials that are somewhat easier (like when you are teaching reading comprehension). That means we don’t necessarily want kids only reading books at one level: we should do something very different with a guided reading group that will discuss a story, and a paired reading activity in which kids are doing repeated reading, and an independent reading recommendation for what a child might enjoy reading at home.

But isn’t this just a waste of time if it is this complicated?
I don’t think it is a waste of time. The research certainly supports the idea that students do better with some adjustment and book matching than they do when they work whole class on the same level with everybody else.

However, the limitations in testing kids and testing texts should give one pause. It is important to see such data as a starting point only. By all means, test kids and use measures like Lexiles to make the best matches that you can. But don’t end up with too many groups (meaning that some kids will intentionally be placed in harder or easier materials than you might prefer), move kids if a placement turns out to be easier or harder on a daily basis than the data predicted, and find ways to give kids experiences with varied levels of texts (from easy to challenging). Even when a student is well placed, there will still be selections that turn out to be too hard or too easy, and adjusting the amount of scaffolding and support needed is necessary. That means that teachers need to pay attention to how kids are doing, and responding to these needs to make sure the student makes progress (i.e., improves in what we are trying to teach).

If you want to know more about this kind of thing, I have added a book to my recommended list (at the right here). It is a book by Heidi Mesmer on how to match texts with kids. Good luck.