Leslie Hall @ Inkspot: test scores

Showing posts with label test scores. Show all posts

Saturday, September 15, 2012

7%

There was some point in junior high school when I stopped understanding math. I kept attending class, did the homework, and got good grades. In high school, I took algebra and geometry and then scored well enough on the PSAT, SAT, and ACT to have been recruited by colleges thousands of miles away from my home in the Appalachia of the West. What the.

Except for geometry (which I loved, I know not why), I did not understand a bit of anything having to do with numbers (as previously discussed here). How did I continue to do okay at something I did not at all understand?

Maybe my happiest moment in college was when I found out I never had to take another math class again ever, not ever, never. Years passed. I graduated. More years passed. I had a bunch of bad jobs, from hostess at Denny's to secretary at an auto repair shop (previously discussed here) to scheduler at a home health agency (where my biggest responsibility was bringing my supervisor a cup of coffee from the stand on the corner and then sitting in her office and listening to her talk about how the divorce was going). I never needed to know more math than what I was pretty competent with, i.e., adding, subtracting, multiplying, dividing, and finding percentages. Whew.

Then after three years of working in the criminal justice system, I decided that crime-fighting was not for me, and decided to return to my One True Love: English. Which would mean grad school, which would mean taking the GRE. I wasn't worried; I'd always kind of liked taking standardized tests, probably because it gave me a chance to do my favorite thing in the world: sit in a corner and read with no one talking to me. That the reading material wasn't always of the finest didn't trouble me. Like gutter winos who drank Night Train, I'd read whatever was available. (Still do. Yesterday while waiting for my daughter at the orthodontist's, I read Alaska Magazine, FORTUNE, and some other rich people magazine.)

So the first time I took the GRE, I did as well as one might expect on the verbal reasoning and analytical writing, and about as poorly as anyone could possibly do on the quantitative reasoning. I was no longer able to pass as someone with a minimally adequate understanding of math. I did so poorly that when I took the GRE a second time, I bubbled randomly for the quantitative reasoning and improved my score by 7%. I don't mean to mislead anyone; this made no significant improvement. If there had been a cut score for far below proficient, that is where my score would comfortably have settled like a little toad in a pond.

I was thinking about this because my daughters' CA STAR test scores came in the mail yesterday. And because I read this, about a grown man who submits to taking the SAT.

As a side note, I'd like to say that one might think this math handicap extends to data analysis, but it don't. I love data. Love it. I love the patterns--sometimes there is even a narrative.

I was reviewing longitudinal test result data for a high school and saw some patterns that might tell a story: a strong majority of incoming freshmen scored in the advanced category, but that there was a steep downward trajectory, with about half as many grade 11 students scoring so well. I have more investigating to do to be able to draw any meaningful conclusions--is this typical of all high school students in the district, state, country, or is this just this school? Could the difference partly be explained by a large influx of lower-performing students at grade 11? What other factors might influence these results?

These might be numbers, but there is a narrative, there are characters, there's a plot with conflict, action, and, one hopes, resolution.

Wednesday, May 16, 2012

When Everything Goes South

Bad news in Florida.

Preliminary results released Monday indicate that just 27 percent of fourth-graders earned a passing score of 4.0 or better (out of 6) on the writing test. A year ago, 81 percent scored 4.0 or better. . . .

Passing scores plummeted from 81 percent to 27 percent for fourth-graders and showed similar drops in eighth and 10th grades, according to statewide results released by the Department of Education.

(Aforementioned preliminary results here.)

I did what I always do when I come across an item of note, whether it be a dime on the sidewalk or a pineapple in the headlines: I called a friend.

We compared our reactions, which were predictably (and comically, as we spoke simultaneously and in the same lexicon--not only are we friends of many years' standing, but we worked together for years, and are in the habit of oft discussing our work, a habit all the more agreeable as we share so many opinions) identical:

1. What up with the scoring?

and/or

2. What up with the test construction?

and/or

3. What up with the cohort?

and/or

4. What up with the cut scores?

Read more here: http://www.miamiherald.com/2012/05/14/2799146/fcat-writing-scores-plummet.html#storylink=cpy

Them's being the usual suspects. Occam's Razor.

As it happens, there was a change in the cut scores recently. The state DOE is (reportedly--I didn't talk to anyone myself) taking the line "that the results of prior years were artificially high and these are the real ones." Although the state did turn around and decide to lower the bar so more students would pass.

Take a look at the exemplar writing sets. These show examples of student essays at each score point level.

And then there may be other factors beyond the cut score. Never dismiss the possibility that someone, somewhere, made a mistake. It happens.

According to Stuart R. Kahl, Ph.d., in a paper for Measured Progress,

A test score estimates something--a student's mathematical proficiency, perhaps. It is an estimate because it is based on a small sampling of the universe of items that could have been included on the test. Further, a test score is affected by factors other than the student’s mathematical proficiency, such as: how well or motivated the student feels, whether there were distractions or interruptions during the testing session, and whether the student made good or bad guesses, to name a few. These factors, which can all be sources of measurement error, explain the difference between a student’s calculated score on a particular test and that student’s hypothetical “true” score. That true score, forever unknown, would reflect the student’s real level of proficiency.

Estimate, inference--these are the words we must use when we talk about our suppositions of what a student knows or is able to do when those suppositions are based on the results of a test.

Florida may sit herself down to rest on the lowered bar. Or there may be an investigation. The investigation will crawl through the maze of scoring operations.

If nothing turns up, the investigation could lumber over to focus on the construction of the test. Again, if the test construction is immaculate, there may be something going on with the cohort.

Or maybe it really was just the cut scores. ("Just." As if that's nothing.)

In other news: the latest on the pineapple here, along with newly reported mistakes on the New York State math tests. There's no glee in reporting that. We all of us get tarred by that brush, even if we have nothing to do with that particular test.

That's what's going on today. Who knows what tomorrow will bring.

Wednesday, October 7, 2009

Poppycock, Folderal, Nonsense

. . . in the immortal words of Todd Farley.

About a week ago, someone sent me a link to an Op-Ed piece in the New York Times by Todd Farley, author of Making the Grades: My Misadventures in the Standardized Testing Industry.

Farley's experiences aren't unique. Like Farley, I am a writer who sort of fell into the test publishing industry by accident. Like Farley, I stayed in the industry long after I thought I would have gone on to what I thought would be my real career of writing novels or screenplays or something, anything.

Both of us started our careers in hand-scoring, so hand-scoring is what I will talk about, specifically the hand-scoring of open-ended test questions. Multiple-choice questions are simple to score, because there is only one correct answer. All multiple-choice test questions are machine-scored. The answer sheets or test booklets are scanned, the answer choices verified by machine, and the scores are then computer-generated. Sometimes there are mistakes in the programming that must be corrected, for example, the correct answer to a given question was actually C but was identified somewhere along the line as A. Sometimes there are mistakes with a student's name or identification number that lead to a mistaken score. Sometimes--and this happened with my daughter's third-grade California STAR testbook--the testbook or answer sheet has juice spilled all over it, and so a false score may be generated. Where humans are involved, there will be some error somewhere, it is unavoidable, let us simply endeavor to put checks in place to catch the errors and processes to correct them.

The scoring of open-ended questions is a horse of a whole nother color. By its nature, there must be some subjectivity. In support of standardization are an array of tools that include a scoring guide or rubric, sample student responses at each score-point-level, and anchor papers and rangefinders. A rubric lists the characteristics of the response at each score point, a sample response gives an example of what kind of response is expected, an anchor is a student response that embodies the score point level, and rangefinders show what may be expected at the high, middle, and low ends of the spectrum within a score point level.

It sounds like a complicated process, and it is. And it's not without its ridiculous moments. And I have to say that though I found much about handscoring interesting, the work itself was tedious and the routine unbearable. But it's not the Orwellian circus of nonsense Farley describes. Or maybe it is at the company where Farley worked; it wasn't at CTB McGraw-Hill when I worked in hand-scoring there.

I am only about a quarter into the book, so maybe there will be some sort of Aristotelian discovery on Farley's part. At this point, he sounds like one of the disgruntled hand-scorers, and there were some of those, people who just never got it, never were able to internalize the scoring criteria and constraints, the ones whose scores had to be checked and re-checked so often that eventually they were let go. He says that he failed to qualify as a scorer for a writing test, which does make one wonder whether this type of work simply was not a good fit for him. Not that I can vouch for what happened at Pearson, as I've not worked there.

I will also say that--although I do not at all see myself as a flag-waver for the test publishing industry, and that I have my own strong feelings about the mis-use of tests and what seems to me to be an abuse of tests and how they are used and what they symbolize and how the data are manipulated--sitting in the mocking judgment seat is generally easy to do. I have plenty of ridiculous stories of my own. We humans are ridiculous, it's in our nature, and thank God that we are, it makes the world so much more entertaining.

And this book is just that--entertainment, a joke that is masked as an indictment of the industry. For myself, I'd be a lot more interested in a thoughtful exploration of the subject, one that takes into account the need for measurement in teaching, and the demand for standardization (because that seems to be the only way to ensure any kind of fairness or equity), and how we could possibly balance these kinds of standardized measurements with classroom performance and evaluations from teachers.

CORRECTION: I mean "Folderol." Geez. And to think I won first place in the 8th grade spelling bee. What did I tell you? Human error.

Monday, March 16, 2009

Incentive or Punishment?

I would rather think of it as incentive. But I like to look on the bright side.

You know that in 2006, Florida came out with a policy of linking teacher pay to student test scores. On the surface it might sound reasonable, but you have to roll back the carpet on this one to see the bugs. What about great teachers in underfunded schools? Great teachers in schools with high populations of second-language learners? Great teachers in schools where families are walking the razor edge of survival, where parents are working two jobs and can't help with homework, where some parents are MIA, having fallen prey to addiction, violence, or some other poison? See, no matter how great these teachers are, they need to grocery shop, too, and how many will be able to resist the call of being able to buy bread and cheese, and so they will migrate to the neighborhoods with higher-priced real estate. Once again, the kids who have the most need will end up with the worst teachers.

Now President Obama is talking about linking teacher pay to performance. Which makes a lot of people nervous. I am all for it, as long as the performance in question is that of teachers, not solely that of the students, and that the performance is measured in a fair way. We all of us in the (non-gummint) world are paid according to our performance. The man who shines shoes at the casino where I take my weekly UNLV class (everything in Las Vegas happens in a casino, it is just part of the local charm, I guess) makes more money because he does a good job (and is just plain pleasant to talk with); a server at a restaurant makes more money in tips because she is competent at her work. Writers can command a higher rate when they have established a solid reputation, and it's the same for consultants. There's no reason teachers should be exempt from expectations.

What I very much appreciate is the possibility of excellent teachers receiving higher pay. I've always thought if teachers could make a decent living, there'd be more excellent teachers. Last week, someone asked me why I didn't teach anymore (I taught a few community college classes in beginning and remedial composition, which I loved doing), and I said I couldn't afford the pay cut. I have to support my children, you see. And I would put money down--this being Vegas and all, I am picking up the local lingo--that I am not the only one.

Leslie Hall @ Inkspot

Saturday, September 15, 2012

7%

Wednesday, May 16, 2012

When Everything Goes South

Read more here: http://www.miamiherald.com/2012/05/14/2799146/fcat-writing-scores-plummet.html#storylink=cpy

Wednesday, October 7, 2009

Poppycock, Folderal, Nonsense

Monday, March 16, 2009

Incentive or Punishment?

Blog Archive

Writing Resources