December 19, 2000

Performance Anxiety
December 19, 2000

Every winter the local press reports on the results of the Maryland State Performance Assessment Program (MSPAP) tests taken the previous spring by 3rd, 5th and 8th graders, and for each of the past three winters, I've attended a coffee with the principal of my daughter's elementary school to discuss the school's results.

Unlike most standardized academic tests, the MSPAP tallies scores for an entire school rather than for individual students. But just like other tests, the MSPAP takes something enormously complex, in this case school performance, and boils it down to a single number, which is then used to position the subjects (the schools) on a linear scale from good to bad.

We're somewhat more sophisticated about standardized testing than we were a century ago when it was invented by Alfred Binet. (Prior to Binet, intelligence was inferred by measuring skull volume.) Most of us suspect that MSPAP scores aren't pure measures of school goodness, that other factors might influence them to some degree.

But these suspicions usually aren't strong enough to prevent us from using the scores as a simple rank. Parents and real estate agents check the scores to decide what neighborhoods are desireable to live in. The principals and teachers at high-scoring schools are quick to congratulate themselves (at least until their scores mysteriously decline), while those at low-scoring schools must struggle to overcome the stigma of their inferior rank.

Is the stigma justified? Maryland acknowledges that non-academic factors play a role in the scores. The published scores are broken out by race and gender, and they include the number of students at each school receiving free or reduced price meals (socioeconomic status), the number for whom English is a second language (culture), and the number of students who transferred to the school in the months prior to the test (transience).

The scores correlate pretty strongly with many of these numbers. I suspect they'd correlate even more strongly with metrics not reported, such as the education level of the parents. But these correlations aren't widely regarded by school system officials as evidence that we're measuring something other than school performance.

More difficult to measure is the possibility of systematic bias in the scoring itself. Perhaps to its credit, MSPAP isn't a machine-scored, fill-in-the-oval test. The tests require students to answer questions by writing complete sentences or paragraphs, or by drawing a diagram or plotting a graph, sometimes while working in groups. Hundreds of teachers then meet over the summer to score the responses. Although in principle this could result in a more realistic assessment of academic skills, it also opens the door to subjectivity, both in the judgments of the scorers and in the drafting of the scoring guidelines.

The reason this is troubling is that standardized testing has a long and ignominious history of validating racial and ethnic prejudices and rationalizing the superiority of elite groups. Such testing was used to set the first U.S. immigration quotas and is responsible for the myth that the "average American" adult has a mental age of 12. "Idiot," "imbecile," "moron," and "feeble-minded," now merely epithets, were at one time psychometric classifications.

The people who devise the tests invariably believe in their objectivity--they don't intend to create biased tests--but with sufficient hindsight, so many of these tests appear so obviously arbitrary and self-serving that any modern effort has to be regarded with the highest skepticism, particularly when elements of subjectivity are built right into the scoring.

My ideas about this have been heavily influenced by Stephen Jay Gould's The Mismeasure of Man, an excellent and accessible discussion of the history of intelligence measurement. For shear inspiration, watch Edward James Olmos as real-life calculus teacher Jaime Escalante in Stand and Deliver (the only movie I know in which calculus plays a central role--but see this list of movies that feature math).

Update

After writing this, I learned that the Abell Foundation has funded a study of MSPAP, the conclusions of which are described as "caustic and negative" by the Washington Post. Oddly, the researchers believe MSPAP leans too far toward assessing performance rather than knowledge, and that it should include more traditional multiple choice questions. Most of the researchers have strong conservative political ties, so what I think we have here is an argument about educational philosophy.

Which in my opinion, as you might guess, completely misses the point. But I can't say more, because no one is being allowed to read the Abell report. It's being kept secret. Bound by a non-disclosure agreement that gave them access to real test questions, the researchers can't publish the report because it quotes the questions.