Teaching Underground: Data-Informed

Showing posts with label Data-Informed. Show all posts

Thursday, May 24, 2012

Cracking the Code: How Testers Language Means Nothing

As a teacher of Ancient World History, one area I find interesting about the period of study is language.   Thousands of years separate civilizations and written language offers a window affording us a glimpse as to the way things were for people who have long since disappeared. When a language is "lost" to time or cannot be translated, a great deal of misunderstanding exists.   Often some catastrophic event or mysterious demise brings on such a void. Sometimes it is geographic distance which separates cultures and prevents mutual understanding. Only about 60 miles separates my school from the decision makers in our state capital of Richmond but it might as well be a million. The gap between us is wide indeed. I think they might even be on another planet.

My students have taken this year's SOL test. I tried to prepare them as best I could for this test that I have never seen. I can''t prepare them for receiving their scores and not knowing what they missed. Somewhere in the language of the test and the scoring there exists a disjoint which results in a process devoid of much value.   This test requires a Rosetta Stone in order to decipher what exactly is measured and how. Far worse, without having seen the test or any of the questions, it is impossible to judge its merits fairly, point out flaws, or seek clarification. The secrets of the test are even more mysterious than the language of the ancients.

Why do we place such a degree of legitimacy on the tests when it is clear they inherently lack legitimacy? How can anyone be allowed to make a test like this and get away with not being more transparent to those that are judged by it? Is the quagmire of documents, forms and numbers designed purposefully to deceive or misdirect? One is left to speculate.

We have explored these issues in several previous posts on the TU. See Bottom, Truth, Fact, $#!%, Flux among others. There are so many things wrong with the tests themselves and the way they are used that for those not directly involved in today's schools it is difficult to comprehend. Painfully evident is the reality that testing is leading us to a place where a growing number of common sense people and countless educators know is bad. A representative in the state legislature of Indiana, Randy Truitt voiced some of this in a recent letter to his colleagues.

Imagine the opportunity to sit with a leader of the society like the Maya or Easter Island and simply ask..."What happened?"   If I had the same opportunity with the folks at Pearson and the state DOE I'd do my best to dig deep. My conversation would ask among other things what exactly are you trying to accomplish?

I'd begin with a printout of "raw" scores. What makes it raw is how you feel when you try to figure out what these scores mean once they are scaled(I usually say chapped not raw). This year is no exception. From VDOE website "the raw score adopted by the Board to represent pass/proficient on the standard setting form is assigned a scaled score of 400, while the raw score adopted for pass/advanced is assigned a scaled score of 500." That makes perfect sense except when you look elsewhere on the site.

So never mind the 53/60 cut score above since my students who missed 7 questions (53/60) only received a 499. I would bet that very few students and even fewer parents would have any idea where the 400 and 500 delineations come from. Aliens perhaps? Apparently that will remain a mystery.

The vagueness there is surpassed still by what the teacher responds when a kid asks, "what did I miss?" All I can offer is the kind of imprecision usually reserved for an ancient text translation or interpretation. "OK Johnny... it is obvious, you missed four in both Human Origins and Early Civilizations and Classical Civilizations. The Classical Civs questions had something to do with achievements of a person, architecture, role of a key person in a religion, and a figure's accomplishments. Not sure what ruler, where they were from or what you didn't know. But what is important for you to remember is that although there were more questions in the HOEC category(thus in theory they each had less value), you again are mistaken because in fact, you only got a 31 scaled scores versus a 32. You got a 394 so you failed. Just do better. Make sense? No? Good."

After consultation with our legal department(each other) and careful inspection of the Test Security Agreement we all sign we elected not to include an actual copy or portion of the grade report. The rationale being that we need paychecks and both have families to support. How sad is it that teachers are scared to question the validity of a test by referencing the actual test or results from it?

If we had included a copy of this student's actual score report you would have seen:

(1)Reporting categories contain vague language like "idenitfy characteristics of civilizations" to describe question that the student answered incorrectly.
(2) category A had 11 questions of which the student missed 4. Category B had 10 questions of which the student missed 4. The student's scaled score for category A was 31, for B 32, with no explanation of why question in category A are are given greater weight.
(3) The scores, grade reports and feedback is clearly not useful to improve student or teacher performance with specifics as to where weaknesses exist.

Imagine that conversation with a student who fails and trying to help them. We are asked to "re-mediate" which I would imagine means we target areas where the student has weaknesses. That is a much tougher task without knowing where exactly they are weak. I can understand not wanting us to teach to the test. How about teach to the kid?

I and my students are judged by a test which in no way serves as a tool to improve my teaching. How on Earth are we to try to do better next year? Those that devise such an approach remain as distant as any of the cultures my students are required to learn. What's more is they manage to encrypt any relevant information in such a way to make it utterly meaningless.

The numbers and stats derived from massive student testing across the state serve little more purpose than to send the message that policy-makers and testing Corporations like Pearson want to send. When scores are too high, standards are raised. When scores are too low, standards are lowered. Neither the Department of Education nor Pearson are able to state in clear language an objective explanation of how scores are calculated and why certain cut score choices are anything less than arbitrary.

The twenty-first century process for holding American students, teachers, and schools accountable should not prove more difficult to translate than Ancient Hieroglyphics.

No Pearson..."Thank You"

Friday, October 28, 2011

The Fallacy of Average Class Size

The average person has one ovary and one testicle!

If you think that’s ridiculous, then you’ll understand the folly of using average class size data in educational decisions. Statements that are mathematically correct can still be blantantly wrong.

Averages are attractive. In uncertain situations they provide a concrete anchor for understanding our world.

Data from the 2000 United States census indicate that the average household size in the United States is 2.59. Does that describe your family?

I didn’t think so, .59 of a person doesn’t exist.

You would call me a fool if I believed everyone has one ovary and one testicle, or even if I spent my days looking for the extra .59 person that should be living in the house next door. But somehow, when the average looks like something that supports our agenda, it becomes a valid measure of reality. Kind of like average class size data.

Modern educators are familiar with the “power of zero” discussion. It goes like this: if a student has scores of 100, 100, 100, 100, and 0, they have an average of 80; a ‘B-’ for most, a ‘C’ for some. According to the argument, an 80 does not reflect the achievement of the student, the zero has an undue effect on the “average.” A move to standards-based grading indicates a desire to measure a child’s true achievement that can’t be measured with an average. So averages aren't good indicators of student acheivement, but it's o.k. to use them as an indicator of how well a system is staffing.

Let’s assume that a school has ten teachers. Four of the teachers have a low class size of say twelve students. Perhaps they teach students who need more support, or they have a class that just met the minimum number for a section. The remaining six teachers each have classes of twenty-nine students. That school has an average class size of 22.2.

What if we looked at a different set of statistics? At this school with an average class size of 22.2, seventy-eight percent of the students are in classes with twenty-eight other students. Sixty percent of the teachers have classes of twenty-nine students. The average class size of 22.2 doesn’t look quite as successful.

What if this small model school were a high school? Each teacher has six classes. We would find six teachers at this school with a load of one-hundred seventy-four students and four teachers with a load of seventy-two student. The average teacher would have one-hundred thirty-three students, but in reality, sixty percent of the staff is teaching one-hundred and seventy-four students. Seventy-eight percent of the students at this school are 1 out of 174 to all of their teachers.

Honestly, we know better. Averages mean very little when divorced from their source, yet we continue to let them drive and/or support our positions. No amount of compiled information can substitute for looking closely at it’s individual parts and an uncritical acceptance of data is a recipe for poor decision-making.

Recently, I presented the following problem to my students:

Three truck drivers went to a hotel. The clerk told them that a room for three would cost $30. Each driver gave the clerk $10, and went to their room. After checking registrations the manager realized he had over-charged the drivers. The cost of the room should have been $25, so the manager gave the clerk $5 and told him to return the difference to the three drivers. On the way to the room, the clerk decided that since the drivers did not know they had been overcharged, he would return $1 to each of them and keep $2 for himself. Now each driver had paid $9 for the room and the clerk kept $2. 9 times three is 27 plus the 2 kept by the clerk totals 29. Where did the extra dollar go?

Just because the data are accurate and the numbers add up doesn't mean they reflect reality. Sometimes we need to get out of the statistics and into people to find the answers.

Have you're experiences ever been misrepresented by "the average"?

Thursday, August 18, 2011

Playing The Education Game

Last year I had 132 students. I was shocked when I had to fail 128 of them after they took their final examination. Only four of my students were good enough according to the standards that I set for my class, so I had not other choice than to fail all of the rest. I hope they learn a lesson and do better this year.

Some of them are very bright, they just didn't master all of the material of the course. Some of them struggle at home and I know they don't have the best support. Most of them would surprise you. You'd never guess they were failures by talking to them. They are articulate and hardworking. I bet they could even succeed in college. Too bad they can't meet the standards of my class.

Does this frustrate you? I find it frustrating. If this scenario were true, there are only two possible interpretations. 1) I am a terrible teacher and need to be removed from the classroom; or 2) The standards and assessments are unreasonable and need to be adjusted. It is that simple. I am either expecting too much or I'm not adequately preparing my students to meet appropriate standards.

The state of Virginia recently released Annual Yearly Progress data for each of its 132 divisions. Only four divisions met AYP. Across the state last week, cities and counties watched their local news to hear about more failure from our public school systems. Politicians and educrats continue to make a mockery of the institution of public education. The only rational reaction to a figure like this (128/132) is to abolish the horrible failure that is public education or get real and admit that our metrics for measuring student, teacher, and school effectiveness are inadequate.

Responding to the media, Albemarle County Public Schools spokesperson Maury Brown said, "we don't think that the worth of a single child or teacher or school system should be measured by a standardized test." Assistant Superintendent Billy Haun said, "we know as a division where we are. I can’t help how the state has chosen to look at success.” As a division, the county achieved 91% pass rates in Reading and Math. Yet for 2010-2011, Albemarle County has failed.

We can't have it both ways, the numbers are meaningless or they're not. As long as administrators hold pass rates up to their teachers and make judgments on teacher effectiveness at the school level it's hard to defend that our divisions shouldn't face consequences from the state and federal government when pass rates don't meet expectations. Individual educators and divisions alike could benefit greatly if testing data could inform decision-making, but data has become the point of education.

Looking back in frustration and ahead with hope, the second part of the quote from Billy Haun might be the most important part of the story. Can we help how the state (and even the federal government) has chosen to look at success? I don't know the answer to that question, but I believe that we need to try. Otherwise we're just spinning our tops and playing games with the students who depend on us. If these metrics are accurate it's time to stop playing safe and abolish this public education and start all over again. If they're not, then let's stop pretending and start acknowledging the quality work produced by principals, teachers, and students every day.

We may not believe that that the worth of a single child or teacher or school system should be measured by a standardized test, but how do we uphold that belief with action?

*quotes taken from the Charlottesville Daily Progress, 8/11/2011

Tuesday, March 1, 2011

Why Data-Informed Trumps Data-Driven

Plot, characters, and setting, the three primary qualities of a story; but imagine a story with only a setting. Well, you no longer have a story, we call that a painting. Data-driven decision-making runs the risk of turning one element of effective education into the primary element and in the process, turning beautiful stories of educational success into static pictures of moments in time.

Data is the setting in our story of education. Within this setting, the characters (students, teachers, parents, etc.) create the plot in their daily interactions. To call for an end to “data-driven decision-making” is sure to raise a few eyebrows. Data is the constant, the solid ground; data takes the guesswork out of what we do. However, focusing on the data to the exclusion of all other elements of our story does not advance the cause of effective education for our students.

Data-Driven Decision-Making Kills Crickets!

“Diana Virgo, a math teacher at the Loudoun Academy of Science in Virginia, gives students a more real-world experience with functions. She brings in a bunch of chirping crickets into the classroom and poses a question:” So begins a story related in the book “Made to Stick” by Chip and Dan Heath. They applaud the teacher for providing a concrete lesson to understand the notion of a mathematical “function.”

I learned a different lesson altogether from this story. After gathering all the data relating to chirp rates and temperature, the students plug the information into a software package and--- AHA! The hotter it is, the faster crickets chirp and even better, IT’S PREDICTABLE! Now students have a concrete example of what a function is and what it does. Next comes the point where the story grabs me. The Heath brothers mention (in parenthesis no less, even calling it a side note, as if this isn’t the main point) that “Virgo also warns her students that human judgment is always indispensible.” For example, if you plug the temperature 1000 degrees into the function, you will discover that crickets chirp really fast when it is that hot.

The moral of the story is this: Data-driven decision-making kills crickets!

Unfortunately, it can also kill good instruction. Recently while attending a district-wide work-session on Professional Learning Communities, a nationally recognized consultant suggested reasons why teachers at a small middle school without colleagues in the same subject should collaborate with teachers from other schools in the same subject. He suggested that when these inter-school teams see that one teacher has better data in a given area, the others could learn what that teacher is doing to get such good results.

I’m not against this type of collaboration, but could it be possible that a teacher from one school whose student testing results (data) are not so good is still better than a teacher in a different school with excellent data? For example, might the data at school A look better than school B because students are getting better support at home. Perhaps school B spends more time making sure students are fed and clothed before concentrating on the job of instruction. What if school A has stronger leadership and teacher performance reflects teacher moral, support, or professional development?

Teachers must collaborate and share stories relating to instruction that works, but if student test-data is the only metric used to evaluate effectiveness we are essentially determining that crickets chirp very fast at 1000 degrees. There is a better choice than “data-driven.” Next week I’ll share my thoughts on this alternative and together we can strive to “save the crickets.”

Follow-up Post: Why Data-Informed Trumps Data-Driven

Pages