We are having an interesting conversation in our district. We currently give AIMSweb as a screening probe three times a year. One of the school psychologists pointed out that for the last several years the first graders seem to do better in the fall than in the spring on nonsense word fluency. When we look at measures of comprehension and fluency using other measures we do not see a decline. Is there any research out there that might help us understand what we are seeing and whether or not this is a serious issue?
What you describe is a common experience with AIMSweb and other progress monitoring tests. And, the more often you re-test, the more often you’ll see the problem. (Thank goodness you are only trying to test the kids three times a year.)
I could find no studies on the nonsense word portion of AIMSweb. But every test has a standard error of measurement (SEM).
The standard error gives an estimate of how much test scores will vary if the test is given repeatedly. Tests aren’t perfect, so if someone were to take the same test two days in a row, the score would not be likely to be the same.
But how much could someone learn (or forget) in one day? Which is the point.
SEM tells you how much change the test score is likely to undergo even if there were no significant opportunity for learning or forgetting. It is not a real change in reading ability, but variance due to the imprecision of the measurement.
Schools tend to pay a lot of attention to the standard error with their state test scores (the so-called “wings” around your school or district average scores). If your school gets 500 in reading on the state test, but the standard error is + or – 5… then we can’t be sure that you did any better than the schools that got 495s, 496s, 497s, 498s, and 499s. Your score was higher, but we can’t tell from this whether your kids actually outperformed those schools within the standard error.
When you calculate the SEM for a school or district score, it will tend to be small because of the large numbers of students whose scores are being averaged. However, when you are looking at an individual’s score, such as when you are trying to find out how much improvement there has been since the last time you tested, SEMs can get a lot bigger.
Unfortunately, schools pay less attention to SEMs with screening or progress monitoring tests than they do with accountability tests.
Nevertheless, AIMSweb has a standard error of measurement. So do all the other screeners out there.
That means when you give such tests repeatedly over short periods of time (say less than every 15 weeks), you’ll end up with unreliability affecting some percentage of the students’ scores.
I’d love to blame AIMSweb for being particularly bad as a predictor test. That would sure make it easy to address your problem: “Lady, you bought the wrong test. Buy the XYZ Reading Screener and everything is going to be fine. You’ll see.”
In fact, studies suggest—at least with oral reading fluency—that if anything AIMSweb has particularly small standard errors of measurement (Ardoin & Christ, 2009).
But even with that, you’ll still find changes in scores that make no sense. John got a 49 when you tested him early in the school year. I couldn’t find an SEM for the AIMSweb nonsense word test, but let’s say to be 95% certain that one score is higher than another is + or – 10 points. Thus, if on retesting you find that his score is 45 it looks like a decline—but what it really means is that John’s score isn’t any different than before.
Teachers usually like knowing that; what looked to be a decline is just test noise.
They usually aren’t quite as happy with the idea that if John goes from 49 to 58 on that test that the change is too small to conclude that any real progress was made. Changes that are within the standard error of measurement are not actually changes at all.
Since I can’t recommend shifting to some other comparable measure (e.g., DIBELS, PALS, CBM) that would necessarily be any more precise, I think what you are doing—comparing the results with those derived from other measures—is the best antidote.
If you see a decline in AIMSweb scores, but no comparable decline in other tests that you are giving…. I’d conclude that there was probably not a real decline. I would then monitor that student more closely during instruction just to be sure.
On the other hand, if the score decline is confirmed by your other tests, then I would try to address the problem through instruction—giving the youngster greater help with the skill in question.
Contact your test publisher and ask for the test’s standard errors of measurement. Those statistics will help you to better interpret these test scores. In fact, without that kind of information I'm not sure how you are making sense of these data.
The problem here: You are expecting too high a degree of accuracy from your testing regime. Give the tests. Use the tests. But don’t trust these changes, up or down, to always be accurate—at least no more accurate than the standard errors suggest that they should be.
Ardoin, S.P., & Christ, T.J. (2009). Curriculum-based measurement of oral reading: Standard errors associated with progress monitoring outcomes from DIBELS, AIMSweb, and an experimental passages set. School Psychology Review, 38, 266-283.
Copyright © 2017 Shanahan on Literacy. All rights reserved. Web Development by Dog and Rooster, Inc.