Two terms I’ve often heard people use incorrectly–and noticeably absent from any national conversation about standardized testing–are validity and reliability. No, they aren’t the same and not interchangeable. Here are my (simplistic) definitions of the two terms:
Validity is the extent to which a test or other assessment measures what it purports to measure. If a teacher gives an assessment to gauge the students’ abilities to draw inferences, but the test consists of main idea questions, it’s not a valid measure. A test of inferential thinking must contain inference questions.
Reliability is the consistency of a test or assessment. An assessment given of a common population, covering common content, and during a common time frame should yield similar results. If I give all of my 10th graders of the same demographic in my classes the same exam on the same day, I would expect the results to be fairly similar.
“Why is this important?” you may ask. Well, it’s critical. Giving a reliable and valid assessment is fundamental when assessing students. We have to trust and depend on the assessment given. If I want to find out what my students can do or what they know, I have to use assessments that will provide the data I need in a reliable manner. I try to ensure that every assignment that goes into the grade book has both reliability and validity.
However, this really isn’t the point of my post. What I really wanted to say is that the Washington State test (the HSPE, the High School Proficiency Exam) is neither valid nor reliable. Yet, we base so much on this test: graduation, remedial classes, teacher effectiveness, program offerings, and more. It may help us see weaknesses across the state or in a single school system, but its use for individuals is inappropriate.
How can the HSPE be a valid exam with only one or two questions over a specific skill. For example, the students are asked only one or two questions about geometric sense. How many questions need to be asked to gauge accurately the students’ skill in this category? I would probably suggest 6-10, but that’s just me. Of course, this would mean a very, very long state test.
I’ve seen a student get 7 out of 8 questions correct on one of my exams on author’s purpose, which would tell me the student understands the skill. Let’s pretend my eight questions are the bank of questions on the state test for that one skill. If the student missed the only question asked in that category on the state exam, the state assumes the student does not know it. If I gave the same student the other 7 questions, he could get them all correct. We have now ended up with an inadequate and, in my opinion, an invalid assessment.
A couple years ago we had an excellent example of the test’s lack of reliability. The state’s average dropped significantly across the board on the reading portion. It wasn’t only one or two schools or regions; it was the entire state’s average that dropped. With that much variance in the state scores from one year to the next, we have to assume all of the state’s students struggled one year more so than their peers before or after them, or the problem rested with the test. Which is more likely: tens of thousands of students or a single assessment?
Even if we want to say the test was reliable, then the difficulty increased, which threw off the results; yet, we still base AYP (adequate yearly progress) on the test, and schools fell into or deeper into trouble with the state because of the test’s results. Students are placed into remedial classes because of the test, and programs may be cut or created because of the testing results.
The way the state uses the HSPE now is to look at the 10th graders’ scores from one year to the next; this means the results of a different group of students taking a different test are used to judge schools, systems, teachers, and students.
If we want to use the state tests in any capacity whatsoever, the only proper way would be to watch a cohort group over time. How did the same students do year after year? What were the students’ scores in 6th grade, 7th grade, 8th grade, etc.? We would be able to see the students’ progress over time in a school or in a district. The same students would be assessed together; we would be able to see how our schools are doing in moving students along a progression of skills.
My examples may be a bit rough (but it’s Saturday and Spring Break), but my message remains: the state test isn’t a valid or reliable measure for individual students, and it is being misused.
If only the decision-makers ran the state like many of us run our classrooms…with validity and reliability.