Friday, April 13, 2012

What Hath Been, What Will Be

There's nothing new under the sun.

AI scoring is always lurking at the edges of assessment talk. Because scoring depends on human labor, and is therefore expensive. Wouldn't it be great to automate scoring, eliminate human workers, and save a ton of money?

Erik Robelen at Curriculum Matters brings up the not surprising results of a study indicating that AI scoring may be as valid and reliable as traditional hand-scoring.

In traditional hand-scoring, a person reads an essay (or other written response), evaluates it against a rubric and other criteria (anchor papers, range-finders), and assigns a score. In AI scoring, the essay is automatically graded by a software program that uses some kind of formula (or combination of formulae) to assign the score.

Before this year, I'd always pitched my tent in a clearing in the traditional scoring camp. But that is in the best of all possible worlds, in the immortal words of Voltaire. When the rubric is fair and sound and based on observable, measurable traits; the anchors and range-finders are solid; and the hand-scorers diligent. Because don't we all have this innate aversion to the impersonal coldness of receiving a grade from a non-sentient program? A program that is not even capable of doing the thing that it is grading us on?

And yet, an experience with one of my daughters' teachers last semester made me rethink AI, at least for classroom use, at least for teachers who lack education, training, and experience in assessment. (Which so many teachers do. Assessment, though it is a big part of education, just is not adequately addressed in teacher credential programs.)

My daughters' teacher applied (or failed to apply, as she marked a project down for a trait that didn't even appear on this rubric) a rubric that broke two of the biggest rules in evaluating student performance. One was that descriptions of performance at different score point levels were exactly the same; another was that the language was completely subjective, the rubric didn't include any observations of what could be measured.

Here's a bad rubric:
Score point 4--Awesome, excellent work, student does a great job.
Score point 3--Still really great, not quite excellent.
Score point 2--Um, kind of bad, actually.
Score point 1--Weren't you listening to anything I said all semester?
Score point 0--Now you're just trying to fail. Mission accomplished, pal.

When I asked the teacher about the rubric, she defended it by saying she'd been using it for 15 years. I tried to explain that a history of use without data was no guarantee that the rubric was sound, but I'm not sure she ever understood how unfair--how invalid and unreliable--her grading system has been for 15 years. No one else ever complained. No one else knew enough to complain.

In such a case, bring on the AI. For the good of the student.

More here if you want to go on.

No comments:

Post a Comment