I get this kind of question frequently from teachers who work with struggling readers, so I decided to respond publicly. What I say about these two tests would be true of others as well.
I am a middle school reading teacher and have an issue that I'm hoping you could help me solve. My students' placements are increasingly bound to their standardized test results. I administer two types of standardized tests to assess the different areas of student reading ability. I use the Woodcock Reading Mastery Tests and the Terra Nova Test of Reading Comprehension. Often, my students' WRMT subtest scores are within the average range, while their Terra Nova results fall at the the lower end of the average range or below. How can I clearly explain these discrepant results to my administrators? When they see average scores on one test they believe these students are no longer candidates for remedial reading services.
Teachers are often puzzled by these kinds of testing discrepancies, but they can happen for a lot of reasons.
Reading tests tend to be correlated with each other, but this kind of general performance agreement between two measures doesn’t mean that they would categorize student performance identically. Performing at the 35%ile might give you a below average designation with one test, but an average one with the other. Probably better to stay away from those designations and use NCE scores or something else that is comparable across the tests.
An important issue in test comparison is the norming samples that they use. And, that is certainly the case with these two tests. Terra Nova has a very large and diverse nationally representative norming sample (about 200,000 kids) and the GMRT is based on a much smaller group that may be skewed a bit towards struggling students (only 2600 kids). When you say that someone is average or below average, you are comparing their performance with those of the norming group. Because of their extensiveness, I would trust the Terra Nova norms more than the WMRT ones; Terra Nova would likely give me a more accurate picture of where my students are compared to the national population. The GMRT is useful because it provides greater information about how well the kids are doing in particular skill areas, and it would help me to track growth in these skills.
Another thing to think about is reliability. Find out the standard error of the tests that you are giving and calculate 95% confidence intervals for the scores. Scores should be stated in terms of the range of performance that the score represents. Lots of times you will find that the confidence intervals of the two tests are so wide that they overlap. This would mean that though the score differences look big, they are not really different. Let’s say that the standard error of one of the tests is 5 points (you need to look up the actual standard error in manual), and that your student received a standard score of 100 on the test. That would mean that the 95% confidence interval for this score would be: 90-110 (in other words, I’m sure that if the student took this test over and over 95% of his scores would fall between those scores). Now say that the standard score for the other test was 8 and that the student’s score on that test was 120. That looks pretty discrepant, but the confidence interval for that one is 104-136. Because 90-110 (the confidence interval for the first test) overlaps with 104-136 (the confidence interval of the second test), these scores look very different and yet they are actually the same.
You mention the big differences in the tasks included in the two tests. These can definitely make a difference in performance. Since WMRT is given so often to lower performing students, that test wouldn’t require especially demanding tasks to spread out performance, while the Terra Nova, given to a broader audience, would need a mix of easier and harder tasks (such as longer and more complex reading passages) to spread out student performance. These harder tasks push your kids lower in the group and may be so hard that it would be difficult to see short-term gains or improvements with such an test. WMRT is often used to monitor gains, so it tends to be more sensitive to growth.
You didn’t mention which edition of the tests you were administering. But these tests are revised from time to time and the revisions matter. GMRT has a 2012 edition, but studies of previous versions of the tests reveal big differences in performance from one edition to the other (despite the fact that the same test items were being used). The different versions of the tests changed their norming samples and that altered the tests performances quite a bit (5-9 points). I think you would find the Terra Nova to have more stable scores, and yet, comparing them across editions might reveal similar score inflation.
My advice is that when you want to show where students stand in the overall norm group, only use the Terra Nova data. Then use the GMRT to show where the students’ relative strengths and weaknesses are and to monitor growth in these skills. That means your message might be something like: “Tommy continues to perform at or near the 15% percentile when he is compared with his age mates across the country. Nevertheless, he has improved during the past three months in vocabulary and comprehension, though not enough to improve his overall position in the distribution.“ In other words, his reading is improving and yet he remains behind 85% of his peers in these skills.