Thursday, May 24, 2012

Cracking the Code: How Testers Language Means Nothing

As a teacher of Ancient World History, one area I find interesting about the period of study is language.   Thousands of years separate civilizations and written language offers a window affording us a glimpse as to the way things were for people who have long since disappeared.  When a language is "lost" to time or cannot be translated, a great deal of misunderstanding exists.   Often some catastrophic event or mysterious demise brings on such a void.  Sometimes it is geographic distance which separates cultures and prevents mutual understanding.  Only about 60 miles separates my school from the decision makers in our state capital of Richmond but it might as well be a million.  The gap between us is wide indeed.  I think they might even be on another planet.

My students have taken this year's SOL test.  I tried to prepare them as best I could for this test that I have never seen.  I can''t prepare them for receiving their scores and not knowing what they missed.  Somewhere in the language of the test and the scoring there exists a disjoint which results in a process devoid of much value.   This test requires a Rosetta Stone in order to decipher what exactly is measured and how. Far worse, without having seen the test or any of the questions, it is impossible to judge its merits fairly, point out flaws, or seek clarification.  The secrets of the test are even more mysterious than the language of the ancients. 

Why do we place such a degree of legitimacy on the tests when it is clear they inherently lack legitimacy?  How can anyone be allowed to make a test like this and get away with not being more transparent to those that are judged by it?  Is the quagmire of documents, forms and numbers designed purposefully to deceive or misdirect?  One is left to speculate.

We have explored these issues in several previous posts on the TU. See Bottom, Truth, Fact, $#!%Flux among others.  There are so many things wrong with the tests themselves and the way they are used that for those not directly involved in today's schools it is difficult to comprehend.  Painfully evident is the reality that testing  is leading us to a place where a growing number of common sense people and countless educators know is bad. A representative in the state legislature of Indiana, Randy Truitt voiced some of this in a recent letter  to his colleagues.  

Imagine the opportunity to sit with a leader of the society like the Maya or Easter Island and simply ask..."What happened?"   If I had the same opportunity with the folks at Pearson and the state DOE I'd do my best to dig deep.  My conversation would ask among other things what exactly are you trying to accomplish? 

I'd begin with a printout of "raw" scores.  What makes it raw is how you feel when you try to figure out what these scores mean once they are scaled(I usually say chapped not raw).  This year is no exception. From VDOE website "the raw score adopted by the Board to represent pass/proficient on the standard setting form is assigned a scaled score of 400, while the raw score adopted for pass/advanced is assigned a scaled score of 500."  That makes perfect sense except when you look elsewhere on the site.

So never mind the 53/60 cut score above since my students who missed 7 questions (53/60) only received a 499.  I would bet that very few students and even fewer parents would have any idea where the 400 and 500 delineations come from.  Aliens perhaps?  Apparently that will remain a mystery.

The vagueness there is surpassed still by what the teacher responds when a kid asks, "what did I miss?"  All I can offer is the kind of imprecision usually reserved for an ancient text translation or interpretation.    "OK Johnny... it is obvious, you missed four in both Human Origins and Early Civilizations and Classical Civilizations.  The Classical Civs questions had something to do with achievements of a person, architecture, role of a key person in a religion, and a figure's accomplishments.  Not sure what ruler, where they were from or what you didn't know.  But what is important for you to remember is that although there were more questions in the HOEC category(thus in theory they each had less value), you again are mistaken because in fact, you only got a 31 scaled scores versus a 32.  You got a 394 so you failed.  Just do better.  Make sense?  No?  Good." 

After consultation with our legal department(each other) and careful inspection of the Test Security Agreement we all sign we elected not to include an actual copy or portion of the grade report.  The rationale being that we need paychecks and both have families to support.  How sad is it that teachers are scared to question the validity of a test by referencing the actual test or results from it?

If we had included a copy of this student's actual score report you would have seen:

(1)Reporting categories contain vague language like "idenitfy characteristics of civilizations" to describe question that the student answered incorrectly.
(2) category A had 11 questions of which the student missed 4.  Category B had 10 questions of which the student missed 4.  The student's scaled score for category A was 31, for B 32, with no explanation of why question in category A are are given greater weight.
(3) The scores, grade reports and feedback is clearly not useful to improve student or teacher performance with specifics as to where weaknesses exist.

Imagine that conversation with a student who fails and trying to help them.  We are asked to "re-mediate" which I would imagine means we target areas where the student has weaknesses.  That is a much tougher task without knowing where exactly they are weak.  I can understand not wanting us to teach to the test.  How about teach to the kid?  

I and my students are judged by a test which in no way serves as a tool to improve my teaching.  How on Earth are we to try to do better next year?   Those that devise such an approach remain as distant as any of the cultures my students are required to learn.  What's more is they manage to encrypt any relevant information in such a way to make it utterly meaningless. 

The numbers and stats derived from massive student testing across the state serve little more purpose than to send the message that policy-makers and testing Corporations like Pearson want to send.  When scores are too high, standards are raised.  When scores are too low, standards are lowered.  Neither the Department of Education nor Pearson are able to state in clear language an objective explanation of how scores are calculated and why certain cut score choices are anything less than arbitrary.

The twenty-first century process for holding American students, teachers, and schools accountable should not prove more difficult to translate than Ancient Hieroglyphics.

No Pearson..."Thank You"

No comments:

Post a Comment