Rubber Rulers and State Accountability Testing in Illinois

  • Testing Assessment
  • 09 August, 2008
  • 0 Comments

Much has been made in recent years of the political class’s embrace of the idea of test-based accountability for the schools. Such schemes are enshrined in state laws and NCLB. On the plus side, such efforts have helped move educators to focus on outcomes more than we traditionally have. No small change, this. Historically, when a student failed to learn it was treated as a personal problem—something beyond the responsibility of teachers or schools. That was fine, I guess, when “Our Miss Brooks” was in the classroom and teachers were paid a pittance. Not much public treasure was at risk, and frankly, low achievement wasn’t a real threat to kids’ futures (with so many reasonably-well-paying jobs available at all skills levels). As the importance and value of doing well have changed, so have the demands for accountability.

  Sadly, politicos have been badly misled on the accuracy of tests, and technically achievement testing has just gotten really complicated—well beyond the scope of what most legislative education aides can handle. And so, here in Illinois, we have a new test scandal brewing (requiring the rescoring of about 1 million tests).

  Two years ago Illinois adopted a new state test. This test would be more colorful and attractive and would have some formatting features that would make it more appealing to the kids who had to take it. What about the connection of the new test with the test it was to replace? Not to worry, the state board of education and Pearson publishing’s testing service were on the game: they were going to equate the new test with the old statistically so the line of growth or decline would be unbroken, and the public would know if schools were improving, languishing, or slipping down.

  A funny thing happened, however: test scores jumped immediately. Kids in Illinois all of a sudden were doing better than ever before. Was it the new tests? I publicly opined that it likely was; large drops or gains in achievement scores are unlikely, especially without any big changes in public policy or practice. The state board of education, the testing companies, and even the local districts chimed in saying how “unfair” it was that anyone would disparage the success of our school kids. They claimed there was no reason to attribute the scores sudden trending up to the coincidental change in tests, and frankly they were not happy about killjoys like me who would dare question their new success (it was often pointed out that teachers were working very hard—the Bobby Bonds’ defense: I couldn’t have done anything wrong since I was working hard).

  Now after two years of that kind of thing, Illinois started using a new form of this test. The new form was statistically equated with the old form, so it could not possibly have any different results. Except that it did. Apparently, the scores came back this summer, much lower than they had been during the past two years. So much lower, in fact, that the educators recognized that it could not possibly be due to a real failure of the schools, but it must be a testing problem. Magically, the new equating was found to be screwed up (a wrong formula apparently). Except, Illinois officials have not yet released any details about how the equating was being done. Equating can get messed up by computing the stats incorrectly, but they also can be influenced by how, when, and from whom these data are collected.

  It’s interesting that when scores rise the educational community is adamant that it must be due to their successes, but when they fall—as they apparently did this year in Illinois, it must be a testing problem.

  Illinois erred in a number of ways, but so have many states in this regard.

  The use of a single form of a single measure administered to large numbers of children in order to make important public policy decisions is foolish. It turns out there are many forms of the test Illinois is using. It is foolish that they didn’t use multiple forms simultaneously (like they would have if it had been a research study), as this can help to do away with their “rubber ruler” problem. Sadly, conflicting purposes for testing programs have us locked into a situation where we’re more likely to make mistakes than to get it right.

  I’m a fan of testing (yes, I’ve worked on NAEP, ACT, and a number of commercial tests), and am a strong proponent of educational accountability. It makes no sense, however, to try to do this kind of thing with single tests. It isn’t even wise to test every child. Public accountability efforts need to focus their attention on taking a solid overall look at performance on multiple measures without trying to get too detailed about the information on individual kids. Illinois got tripped up when they changed from testing schools to testing kids (teachers didn’t think kids would try hard enough if they weren’t at risk themselves, so our legislator went from sampling the state to testing every kid—of course, if you want individually comparable data it only makes sense to test kids on the same measure).

  Barack Obama has called for a new federal accountability plan that will make testing worthwhile to teachers by providing individual diagnostic information. That kind of plan sounds good, but ultimately it will require a lot more individual testing, with single measures (as opposed to multiple alternative measures). Instead of getting a clearer or more efficient picture for accountability purposes—and one less likely to be flawed by the rubber ruler problem, it can’t help but be muddled as in Illinois. This positive-sounding effort will be more expensive and will result in a less picture in the long run.

  Accountability testing aimed at determining how well public institutions are performing would be better constructed along the lines of the National Assessment (which uses several forms of a test simultaneously with samples of students representing the states and the nation. NAEP has to do some fancy statistical equating, too, but this is more likely to be correct when several overlapping forms of the test are used each year. By not trying to be all things to all people, they manage to do a good job of letting the public and policymakers know how are kids are performing.

Comments

See what others have to say about this topic.

What Are your thoughts?

Leave me a comment and I would like to have a discussion with you!

Comment *
Name*
Email*
Website
Comments

Rubber Rulers and State Accountability Testing in Illinois

0 comments

One of the world’s premier literacy educators.

He studies reading and writing across all ages and abilities. Feel free to contact him.