Accountability Debate Taking Place on Different Planets

By Charles Barone, Policy Director

Today, at an event hosted by the Alliance for Excellent Education, a group of advocates and academics including Linda Darling-Hammond of Stanford and Gene Wilhoit, formerly of the Council of Chief State School Officers, announced a “new approach on accountability and testing” (aka 51st state) that would, among other things, scale back annual statewide summative tests and increase the use of local assessments for purposes of accountability.

Weirdly enough, over at the Center American Progress, a new study issued today finds that it is districts much more so than states that account for what many consider “over-testing” taking place now in America’s public schools. Which seems to suggest that the 51st state proposal is singling out the wrong level of government if what it wants to do is decrease the amount of testing.

We should encourage local innovations in how we assess students. If that’s what Darling-Hammond, Wilhoit et al. are trying to do, I’m all for it. But attempts to substitute myriad local tests for valid and reliable statewide assessments for purposes of accountability are just as bad an idea on a policy level now as they have been every other time they’ve been proposed.

If we revert to a patchwork of standards and assessments that vary according to political pressure, or societal and community biases, or simply the lack of local capacity to create valid and reliable tests, we will longer be able to make apples to apples comparisons about school performance. In turn, the schools in which poor and minority students are enrolled are likely to look better than they actually are. Badly needed investments in teaching and learning and in formulating and implementing fundamental reforms in chronically failing schools will then be at even greater risk than they are now.

You don’t have to take my word on whether or not we can make valid comparisons across schools with a bunch of new local assessments.

Back about a decade or so, an esteemed panel convened by the National Academy of Sciences (an organization that, to put it mildly, has never been accused of overzealousness when it comes to student testing) was tasked with answering that very question:

“Can scores on one test be made interpretable in terms of scores on other tests? Can we have more uniform data about student performance from our healthy hodgepodge of state and local programs?

And the result was:

“After deliberation that lasted nine months, involving intensive review of the technical literature and consideration of every possible methodological nuance, the committee’s answer was a blunt ‘no.’”*

Absent some huge shift in the technical and methodological literature, one would have to conclude this is no less true now than it was a decade ago. End of debate? Don’t bet on it. There’s much more to come.

—————————————————-

*Michael Feuer “Moderating the Debate,” Harvard Education Press, 2006.