Showing posts with label quality. Show all posts
Showing posts with label quality. Show all posts

Wednesday, July 31, 2013

The Big Idea, Cross-Referenced to Basic Rules of Item Writing

What endeavor doesn't benefit from planning and preparation? What endeavor succeeds without preparation?
Ah, fatal words! Too late in moving here, too late in arriving there, too late in coming to this decision, too late in starting with enterprises, too late in preparing.
These first guidelines of the CCSSO/TILSA Quality Control Checklist for Item Development and Test Form Construction should be considered in early stages of planning, long before item writing assignments are made:
1A. Each item should assess content standard(s) as specified in the test blueprint or assessment frameworks. 
2A. Items must measure appropriate thinking skills as specified in the test blueprint or assessment frameworks. 
3A. Items should be written at appropriate cognitive levels and reading levels according to the item specifications guidelines.

A test blueprint identifies the skills and/or knowledge to be assessed, provides the item-to-skill distribution, and specifies item formats.

Let's think about creating a blueprint to assess writing at grade 6. We'll base the blueprint on the Common Core State Standards.

In the CCSS, English conventions are addressed in the language standards, and what we might call writing strategies and application are addressed in the writing standards. The language standards could be assessed with a variety of formats: standalone or passage-dependent multiple choice items, standalone or passage-dependent technology-enhanced items, or as one component of an extended-constructed-response item.

Here is a writing standard:
1. Write arguments to support claims in an analysis of substantive topics or texts,using valid reasoning and relevant and sufficient evidence.a. Introduce precise claim(s), distinguish the claim(s) from alternate oropposing claims, and create an organization that establishes clearrelationships among claim(s), counterclaims, reasons, and evidence. 
b. Develop claim(s) and counterclaims fairly, supplying evidence for eachwhile pointing out the strengths and limitations of both in a manner thatanticipates the audience’s knowledge level and concerns. 
c. Use words, phrases, and clauses to link the major sections of the text,create cohesion, and clarify the relationships between claim(s) and reasons,between reasons and evidence, and between claim(s) and counterclaims. 
d. Establish and maintain a formal style and objective tone while attending tothe norms and conventions of the discipline in which they are writing. 
e. Provide a concluding statement or section that follows from and supportsthe argument presented.


Generally the above standard would be assessed with an extended-constructed-response item, because multiple-choice items and short constructed-response items don't allow students sufficient opportunity to demonstrate the ability to "write arguments to support claims...." However, the subskills may be (and frequently are) assessed with multiple-choice items; this is more common at the district or classroom level than at the state level. You might see a question that addresses W.1.a by asking the student to choose the best opposing claim for a given argument. Such multiple-choice items may help teachers isolate specific areas in which a student needs instruction and support.

Here are language standards:
1. Demonstrate command of the conventions ofstandard English grammar and usage whenwriting or speaking. 
a. Ensure that pronouns are in the proper case(subjective, objective, possessive). 
b. Use intensive pronouns (e.g., myself,ourselves). 
c. Recognize and correct inappropriate shifts inpronoun number and person.* 
d. Recognize and correct vague pronouns(i.e., ones with unclear or ambiguousantecedents).* 
e. Recognize variations from standard Englishin their own and others’ writing andspeaking, and identify and use strategies toimprove expression in conventional language.* 
2. Demonstrate command of the conventions ofstandard English capitalization, punctuation, andspelling when writing. 
a. Use punctuation (commas, parentheses,dashes) to set off nonrestrictive/parentheticalelements.* 
b. Spell correctly.

All of the above language skills may be assessed with multiple-choice questions. These could be standalone, or could offer a stimulus: an editing passage with embedded errors. More on language items as previously discussed here.

For our imaginary grade 6 writing test, we might decide that we'd like to use multiple measures in order to obtain as much information as possible in as many different ways as we can, so we're going to create a blueprint that specifies a combination of item formats and includes x number of multiple-choice and technology-enhanced items, along with one extended-constructed-response to a writing prompt; this response will be scored with a holistic rubric that addresses organization, style and voice, and conventions. We would develop a test blueprint that specified the standards and subskills to be assessed, along with the number of items and item formats for each standard or subskill.

In our blueprint, we may also use Bloom's Taxonomy or Norman Webb's Depth of Knowledge Guide to determine the cognitive level for each item. Although the cognitive levels of some skills are relatively simple to determine, based on what is required from students, some skills may be addressed at multiple levels of cognitive complexity.

We may instead indicate the cognitive levels, item difficulty, and content or domain limits, and reading levels in the item specifications, as suggested in the CSSO/TILSA checklist.

In a typical statewide high-stakes assessment program, the decisions that inform the development of a test blueprint and item specifications are made by committees, which is as it should be, and committees should include classroom teachers. Committees often include other stakeholders, e.g., business leaders who may be asked to identify skills and knowledge necessary in the workplace.

Once all of that preparation is complete, item development begins.

Now let's say we've received an assignment to write those multiple-choice language items and that ECR writing prompt. We've read all of the project documentation and support materials; we have the item specifications in front of us. 

It is a truth universally acknowledged that a test item should target one and only one skill or bit of content knowledge. Each idea should have one big idea; every part of the item should support that focus. 

If we were going to write a multiple-choice item for W.2.b, our big idea would be how to spell grade-level appropriate words. We might write an item that looks like this:

Which word is spelled correctly?
A absense
B boundery
C civilizashion
D dissolve*

This item clearly targets one skill: correctly spell grade-level-appropriate words. The stem tells the student exactly what to do. The item is phrased simply and concisely. The content is neutral; there are no highly-charged words. All of the answer choices are grade 6 words (according to EDL Core Vocabularies); all are words likely to be known to grade 6 students and are words that are significant to academic content areas. There are no tricky or esoteric rare words. The answer choices appear in a logical order (here we use alpha order). All of the distractors address common spelling mistakes: using s instead of c, using e instead of a, and writing phonetically. None of the words are homonyms and so none are context-dependent; each of these words have one correct spelling.

Here is a poor item addressing the same skill:

Which word is written correctly?
A musheenz
B rabby
C anker
D pistol

This item has multiple flaws. First, the big idea is not specified in the stem; the student doesn't know what s/he is expected to do until s/he reads the answer choices. The answer choices are not grade-level-appropriate; "machine" is a grade 2 word, while "anchor" is grade 3. The word "rabbi" may not be familiar to grade 6 students. Answer choice A ("musheenz") is plural, while the other ACs are singular. Answer choice A also offers mistakes that are unlikely to be made by students at the targeted grade level. The answer choices do not appear in any logical order. Finally, the correct response is a type of weapon.

As bad as this item is, though, we could make it even worse by

  • increasing the reading load by burying the spelling words in sentences and offering four sentences as the answer choices;
  • obscuring the targeted skill by adding in other types of conventions errors, such as mistakes in capitalization and punctuation;
  • using homonyms, or words that are spelled differently depending on the context;
  • using above-grade-level vocabulary.
Item writing is both an art and a science. There's so much to consider, even in writing the simplest spelling item.






Saturday, July 27, 2013

How to Get the Best from Item Writers

Many years ago, I was a development manager at a Great Big Huge Test Publishing Company. I've already told the story of how I began as a temp employee in hand-scoring, as so many recruits to the test publishing industry do. Armed with my book-learnin' and a new but hardly marketable M.A. in English, emphasis in creative writing, I was thrilled to get a job that paid slightly more than $10 an hour, a job that had to do with words and writing. Yay me, illustrating the joy of low expectations.

When the Great Big Huge Test Publishing Company was awarded what was then considered a big statewide assessment contract (back in the days when we tested at grades 3, 6, 8, and 10, or grades thereabouts), I was plucked from hand-scoring, handed the title of associate editor and deposited in a cubicle in a cavernous upstairs honeycomb which cubicle I shared with another associate editor who'd also come from hand-scoring. Within 5 years, I'd gone from the windowless cubicle of associate editor to content editor to supervisor to program manager to the window office of development manager. You can probably guess at my success as a manager, given I had no training and little experience in management. Oh, if only I had read Bringing Out the Best in People: How to Apply the Astonishing Power of Positive Reinforcement.

Which I have since read, and which principles I endeavor to apply when I'm called to supervise others, and to the effectiveness of which I can attest. We do what we know; when we know better, we do better.

It may be tempting, when we consider the plummeting quality of what we see on test materials, to blame the writers. But as this video from ETS reminds us, the writer is only one of many contributors.

Getting the best from item writers has to start long before an editor sends out that Are you available? email. The foundation of a project must be sound; there must be a blueprint and prototypes; there must be a clear vision of what the product is intended to look like, how it is intended to perform, what skills/knowledge it is intended to measure, and how it should measure those skills/knowledge.

These decisions should not be left to the item writer; few item writers are equipped to make such decisions. In the past, item writers worked in-house, or were mostly former employees of test publishing companies, and so were at least minimally conversant with principles governing the design and construction of assessments. Sometimes item writers were corralled to help assemble the tests and were given Xeroxed sheets containing lists of item numbers and associated data. That is no longer the case. I don't know of any test publishing company who maintains a staff of in-house item writers. Today test publishing companies commonly hire item writers who have never worked for test publishing companies and who have little experience writing items for high-stakes assessment (they may have written for curriculum and textbooks, if they have any experience at all). They may not have any classroom experience; they may not even have kids, and so the world of education--the real world of education and of what kids really are able to know and do at a given grade level--is a mystery to them. Or they develop their own ideas about what K-12 students know and can do, ideas that are as inaccurate as they are ambitious and inflated. (This is through no fault of their own, but the remedy is simple: volunteer in the classroom. Go to a school and offer to spend an hour a week in a classroom.)

Even if item writers were equipped, they shouldn't make decisions which should rightfully be made at a much higher level, by folks with greater knowledge, experience, and authority. Such decisions take time. There must be time to consider, reflect, think about it in the shower and in the car, time to return to one's colleagues and say Well, what if and how will it work if. The what-ifs must be given time to rise to the surface.

Rushing inevitably creates chaos. Whatever writers produce under slippery circumstances--when the expectations are not specified-- will fail to meet those unspecified expectations.

Assuming, however, that the big decisions have been made, and that the writers have been provided with everything they need to do a good (or excellent) job, what else can companies do to get the best from writers?

1. Take care of all housekeeping details upfront. Provide the writer with written information about the scope of work, schedule, deadlines, pay rates, and points of contact. Preferably all in one email message. Send the contract and the W-9. Tell the writer whom to invoice and how. Remove possible sources of worry. Worry is destructive to creativity and productivity. 
2. Provide training. The training should be as brief as possible, and should be conducted at the commencement of the project. A training that is offered a month before writing begins is useless, because writers will have forgotten the information they learned. Materials for the training should be emailed in advance. The writers should be told whom to call if they have questions.
3. There should be a dedicated content lead available to respond to writers' questions and to provide timely guidance throughout the course of the project.
4. Give writers the chance to do it right. The content lead's ducks must be lined up and ready to waddle. There must be a clear style to follow, preferences to comply with, and so on. The directions and feedback should be clear. To be effective, feedback must be immediate. Feedback must have the purpose of informing work in progress. Consider how disheartening it is to submit 50 items and then be told that there is now a new requirement, please revise those items accordingly and resubmit.
5. Allow the writers to work as they work best. More and more companies are requiring writers to input items directly into an online authoring system. While some of these are better than others, all add time and effort on the part of the writer, thus siphoning off energy better spent on item development. For each project, writers must learn how to use a new system; they might finish the project before they become proficient. Then it's off to a new system. I often decline opportunities to work in authoring systems, because I find the levels of clickage annoying--seconds add up to minutes add up to hours over the course of a year, hours I would much rather have spent reading or looking out the window or talking to my daughters or whatever else.
6. Let the writers do the work they do best. Writers write. Now that companies are operating on principles of leanness akin to corporate anorexia, companies are expecting writers to take on the work that used to be the province of content editors and desk-top publishers. With no increase in pay and no increase in time allotted to do the work.
7. Give writers enough space to write. Some assignments are so rigid and exacting, with so many criteria of so many types, that they become impossible.
8. Allow the writers to contribute their unique knowledge, experience, and skills. Writers work for all the educational assessment, test preparation, and curriculum publishers. They have access to a depth and breadth of knowledge about what's happening in educational publishing that is denied to the folks whose only job in educational publishing has been to work at the one company at which they are currently employed. Being open to the possibility that the writers know something and giving the writers freedom beyond the stricture This is how we do it will only serve the company and ultimately, the kids.
9. Be a human and let the writer be a human. We are none of us robots. We all have strengths and weaknesses. We have the skills we shine at and the skills we don't. This is normal and the nature of being human; it's not a flaw unique to a particular writer if she has trouble juggling multiple spreadsheets (not to name any names, me). No one in this world is capable of doing everything perfectly; no one is guilty of never making a mistake. The industry used to understand that; the protocol for test publishing included many rounds of editorial review prior to submitting materials to proofreading, and then to QA. 

If these principles were applied, quality would improve.

That's all I got for today. I'm off to Valencia, to the Cal Arts campus, to visit my daughter, Twin A ("A" being the initial written on the knitted cap the nurses placed on her head after her birth) who is a creative writer in the California State Summer School for the Arts program. 

What I'm reading: I finished As I Lay Dying. I love Faulkner. It always takes me at least half the book to marshall my resources to focus on his writing, I find it so challenging, but once I'm in, I'm there. I have a novel by Andre Brinks next, I think.




Thursday, January 17, 2013

Whale in a Bathtub

Does anyone remember Helen Palmer's A Fish Out of Water, one of the classics from my childhood? The plot is simple, but compelling: A boy brings home a fish, overfeeds it, and the fish gets bigger and bigger and bigger until it is immense. I won't spoil the ending for you.

The need for growth management is one of the main points of what I've heard and read so far in the Coursera business class. Every owner of a successful business should carefully consider growth management. I'm thinking about small businesses now, like mine, but we've previously discussed problems associated with a failure to plan for and manage growth in larger companies. Quality is what seems to go out the window first when growth exceeds capacity.

This is not exclusive to educational publishing, or even publishing in general: it's a universal principle, as previously applied to dentistry.

Now I'd like to apply it to orthodontics. What up with the teeth? I have teen-agers, one of whom has recently been freed from braces, the other of whom is still wired up. They are twins, as you know, so why is one wired and the other free? The orthodontist's failure to manage the growth of his practice.

Like Dr. Bad Dentist, Dr. Too Successful Orthodontist came highly recommended. By three different people, one being my current dentist and another being an orthodontist in another state who went to school with Dr. Too Successful. During our consultation, he was friendly, explained everything clearly, answered all our questions, and seemed highly attentive.

Once we had signed on for a course of treatment for both my daughters, our experience turned southward. Dr. Too Successful's practice is booming. This means for a negative client experience, characterized by:

  • a loud, crowded waiting room
  • waits of up to and exceeding an hour for scheduled appointments
  • difficulty in making appointments that will accommodate one's own schedule
  • impossibility in rescheduling appointments
  • technicians who are hurried and under stress, which doesn't really bring out the best in anybody
  • necessity of having to see Dr. Too Successful's junior partner orthodontist
  • a breakdown in communication between the client, technician, and doctor, resulting in a six month extension of treatment
The latter being what happened with us. That is, we were concerned about how my daughter's teeth were slanting, I mentioned it to the technician, she said that the junior partner doctor was aware of it, and I assumed he knew what he was doing. Three months later, when treatment was due to conclude, the junior partner doctor said that whoops, some girls look really pretty with teeth that slant outwards, it makes their lips look full, some people really like that. . .but if we wanted, we could have 4 teeth extracted and start all over. That was six months ago, my daughter is still in braces, and poor thing, she may be wearing them until she is thirty.

Regardless, the main point is that Dr. Too Successful has completely lost control of his business. Soon after my conversation with the junior partner doctor, I called Dr. Too Successful to let him know that I'd experienced an erosion of trust. He said he appreciated my call and we talked about what happened. I explained that I didn't fault his office for making a mistake, but what bothered me was that the mistake was preventable, being as it was a direct result of either the technician not telling the junior partner doctor about my concerns, or of the junior partner doctor simply not paying attention.

The call concluded with Dr. Too Successful assuring me that he was dedicated to regaining my trust. And so he had seemed to be, for the next few appointments, but then his attention was captured by other clients and more pressing demands, and our experience as clients is once again suffering.

Would I ever recommend Dr. Too Successful to anyone? No, on the contrary, I would tell everyone to flee as fast as their feet will carry them. I don't like saying this, because Dr. Too Successful is likable and seems committed to doing good work. And yet, by neglecting to control the growth of his practice and by forgetting to consider his clients' experience, Dr. Too Successful caused my daughter unnecessary pain and me unnecessary inconvenience. And he has had a negative effect on my business, in that I've had to take additional time away from my work to keep taking my daughter to orthodontist appointments, appointments for which I always have to wait at least 20 minutes, and often have to wait an hour. 

Here are the questions I ask myself:
  • What is my typical client's experience?
  • Is there any negative aspect to my client's experience of Inkspot?
  • How can I improve my client's experience?

One of the little mottoes of the business course is that one has to love the client more than one loves the product. It's worth thinking about.

UPDATE: Lest anyone think I am casting stones from a glass house, I'd like to add this bit of irony. Last week, I had to call and reschedule an appointment at the last minute because my daughters had a school obligation arise suddenly that could not be missed. I was told the next available appointment was toward the end of February. The day and time offered conflicted with cello lessons, and I was offered an appointment in mid-March. By this time, feeling frustrated with having to find a time that fit my daughter's schedule, my schedule, and the straitjacketed schedule of the orthodontist, I said that I really needed the office to accommodate me and be more flexible, and here I mentioned that my daughter's treatment had only been extended to this point as a result of mistakes made in their office. I was given an appointment for yesterday.

We went to the office, and even though everyone was courteous, it was strained, formal courtesy. Something was amiss, something more than what had passed between the receptionist and me on the phone. When the doctor examined my daughter's teeth, he spoke exclusively to her, as if I weren't there.

At the end of the appointment, I opened my planner to make our next appointment and saw that WE HAD ARRIVED AN HOUR LATE. That's right. In my mind, our appointment was at 2:30, because that was the time of the original appointment, but we should have been there at 1:30 and NO ONE HAD SAID A WORD. Though their body language and tone said plenty.

I was so astounded that I didn't know what to do, but immediately upon arriving home, I called the office to offer abject apologies. I apologized about ten different ways, and I sure hope that my apologies made their way to the technicians and the doctor.

And yet? You understand that the doctor has never made that kind of apology to me for their mistake. I was an hour late for an appointment, but their mistake means that my daughter will remain in braces for what looks like a year after the original estimated end of treatment. But I guess some people find it much more difficult to apologize than others.


Saturday, August 18, 2012

What the World Needs Now

It may be unnecessary to say that in talking about what's wrong, I don't mean to pick on anybody in particular. Even if it is one person making specific mistakes, I tend to see this as a systemic ill, as previously discussed, rather than an opportunity to dogpile on an unwitting offender.

A systemic ill that is particularly pernicious in the test publishing industry. How can we expect kids to do well on tests when the tests themselves are riddled with error?

The safeguards are so simple:
1. provide adequate training (again, as previously discussed)
2. follow some kind of standard process for detecting and repairing error
3. use the information from #2 to supplement #1

These safeguards cost time and money, which explains why so many companies have gradually let them fall by the wayside. 

Training takes time and attention. I can't believe how fortunate I was to have received the training I did. In my first year as an associate editor, I probably had more training opportunities than editors now receive in a decade. It was a different era. The Great Big Huge Test Publishing Company at which I worked had an army of style editors, proofreaders, and QA personnel.

Over the last dozen years, I've seen companies completely dismantle their style editing departments and outsource the work to freelance editors. Now style editors (copy editors) are held in such low regard that if any Bright Young Thing with a talent for grammar and language conventions tells me she'd like to give this work a try, I steer her away from style editing and into content.  I hate doing it. I myself hold style editors in high esteem, maybe because often their job seems to be to keep me from making a fool of myself. What the world needs now--no disrespect intended to Dionne Warwick and Burt Bacharach--is more style editing.

Many companies have also dismantled their content development departments.

So there's no real continuity for item writers and content editors. They work on a project for one company and then skip to a project for another. The content development people who remain at the companies are overwhelmed and certainly don't have the time to provide training to freelancers; editors accept or reject items, which means they're likely to fix mistakes themselves, and so the item writer never hears of his mistakes and is free to repeat those mistakes forever in a state of unfortunate ignorant complacence.

Right now, I'm thinking of all this in terms of language conventions items, because I was reviewing a set this morning, but I'm sure this applies to all domains and content areas.

Right now, I offer some specific observations about language items:
1. When language items accompany an editing passage that contains embedded error, the content of each item must be mutually exclusive. A sentence from the passage should contain only one error, and should be used only for one item in a set.
2. "Syntax" has to do with grammatical rules; "diction" has to do with the writer's choice of words.
3. An item should clearly target only one type of error, and should not mix types of errors. For example, a punctuation item shouldn't include distractors that contain errors in verb tense.

More previously discussed here.

As an aside, it's interesting how you can learn a lot about a person by reviewing his or her work, and how this body of knowledge grows over time. It's like how my mechanic told me that he knows more about his customers from the condition of their cars than he does from talking with them. He can tell who drives erratically, who slams on the brakes, who grinds the gears, and he can tell who doesn't. Everything is everything. You can develop a similar personality profile simply by reviewing a writer's work.

I can't tell you how many times I've finished working on a manuscript, items or reading passage, don't make no difference, and felt a deep and tender affection for the writer because his or her work demonstrated care, attention, and conscientiousness. 

UPDATE: Forgot to mention. I started writing about all this because I saw this book mentioned in the Common Core Standards and immediately ordered it. 

Thursday, March 15, 2012

It's Like That

. . . in the immortal words of Run DMC


Recently, as part of my quality reform campaign, I took several protégées under my wing. It was a responsibility I'd been ducking for years. I've always thought that I more than met my obligations to my industry by simply doing the best work that I could do.


But how much of a difference can I make by myself? Yesterday I was talking with a friend who mentioned that 22,000 items will be coming through the agency where she wields her magic works. 22,000! How is it even possible to glance at 22,000 items, let alone perform a thorough content review?


Hence the protégées, whom I've been training in the manner in which I was trained, but more so. That is, much of the training I received was on-the-job, and even though it was much better, much more comprehensive than what is passing (or not passing) for training in most places nowadays, there certainly could have been more of it. (More did come later, when the director of our department launched the development of a processes and procedures manual.) Which I recognized at the time, and which inspired me to educate myself in the work of the industry.


I spent a great deal of time talking to colleagues in various departments of the Great Big Huge Company where I was employed. Like many companies, Great Big Huge Company had a silo culture. People mostly stuck with their own. The programmers ate lunch with the programmers. The style editors took walks together. The content people went out for coffee together. The psychometricians mostly remained in their offices, except for when they appeared at meetings. The upstairs people stayed upstairs, while the people in operations kept everything in motion downstairs. I went everywhere (I got lost a few times in that first few months, once in the warehouse, a cavern with concrete floors on which were stacked towers of testbooks on wooden pallets) and talked to everyone, from those in shipping and operations to manufacturing and finance. I wanted to know what everyone did and how everyone's work fit together to create the bigger picture. (With much envy, I listened to stories of the glory days of Great Big Huge Company, the days of raucous St. Patrick's Day parties and of flights in some executive's small plane and of leisurely, sociable Friday lunches. It was the '80s.)


I started reading. A lot of what I read made no sense to me. I didn't speak the language yet. It was discouraging to understand so little, but I approached it as if it were grad school and kept reading, kept asking questions, and always, kept doing the work, and then eventually I knew what I was doing.


This sense of mastery is what I hope for the protégées--this group, the group to come, maybe many groups in the future. Maybe these will go on to train others in best practices. Now that I've begun, I see that it's a worthwhile endeavor. Two or three or four protégées who may go on to become experts and provide work of the highest quality may not sound like a lot. But these two or three more people doing excellent work will leave less room for two or three bumblers. And then when clients get into the habit of receiving excellent work, they'll have less patience for the substandard. That's my fond hope, anyway.


Assessment content development is a funny little world. There's not much published information about what it is and how to do it right. Content developers don't often meet to share information, to talk about challenges and discuss solutions. When people ask me for sources, I point them in the direction of Thomas Haladyna. James Popham. Grant Wiggins. Jay McTighe.


Even so, one won't always agree, and when one does agree, there will be gaping chasms gaps between theory and execution.


[Thanks to Bob DeBris, who introduced me to the video posted above.]

Monday, February 20, 2012

File Under: All Roads Lead to Rome, Cross-Referenced to Mandatory Reading

The book every K-12 content developer--assessment and curriculum--should read is Tested: One American School Struggles to Make the Grade, which was recommended to me by a colleague and likeminded comrade in quality assessment content development, Carmen, a senior level genius expert at Anonymous Testing Company.


To say that the tests administered yearly at grades 3-8 (inclusive) are high stakes cannot possibly begin to convey what this means for students, teachers, and school administrators and how NCLB has transformed the educational landscape. The flames are licking at their feet every minute of every day.


And even when we who are informed are talking about this, and about how awful it is that teachers must teach to the tests and that the assessments are driving the curricula--it's one thing to talk about it, and it's another thing to experience it.  If you're not a student, teacher, principal, or parent, this book is the closest you can get to the fire.


Like the children in Tested, my daughters' skills and knowledge are assessed so frequently at school (and what is taught is often so narrowly focused--the algebra teacher actually labels each homework assignment with the assessable standard and tells parents that she does this so she will know whether students will answer those questions correctly on the state test) that I'm shocked by how little genuine instruction they actually receive and so I supplement their classroom instruction by offering my own reading, writing, social studies, and science lectures. Which may bore them nigh unto death, who knows, but I refuse to send them out into the world as little ignorami ignoramuses. (For math homework help, we turn to my friend and cohort and math content area genius expert Carrie Frech, who works at a major testing organization).


My daughters came home last week emitting little puffs of indignation over the latest district benchmark assessment. (They swiped it and brought it home to show me.) When I read it, I was horrified. It was a passage-dependent writing prompt. I didn't see a rubric, God knows what horrors hide behind that curtain, but I assume the responses were scored for reading and writing.


The story, a tedious adaptation of a folktale, was poorly written and at the fourth grade level (this, for an eighth grade gifted and talented program; my daughters are currently reading The Great Gatsby and Rebecca for their next book reports, and yet they're being assessed with text suitable for fourth grade?). What was there was was presented at an extremely literal level of understanding. Nothing in the story allowed for any genuine analysis of narrative elements nor interpretation of literary devices, and yet the writing prompt required the students to do just that. I don't know how they could. You can't make a pie out of one apple. The multiple-choice section (developed by a company relatively new to the game for whom I'd done some work a few years ago and by whose lack of understanding of test development at that time had shocked me) was no better. The girls told me that there was one question that was so nonsensical that, districtwide, the teachers' form of protest was simply to give all of their students the answer. Does anyone see any value whatsoever in the use of such an assessment tool?


When I did some work for this company a few years ago, I observed their inexperience with and lack of knowledge about assessment. Maybe things have changed since then, I don't know. What I do know is that this company--the same one that has little assessment background--doesn't perform any field testing of the test content and there is no data whatsoever to indicate that we can make any kind of accurate inferences about what students know and can do based on such poorly constructed assessments. But this company does a good job of selling, and the districts buy the dream fantasy idea that the products will 1) give teachers information about their students that will help them get students ready for the state test and 2) predict how students will perform on the state test. Neither claim is possible, particularly with assessments that violate the most basic quality standards.


All roads lead to Rome; it all comes back to the Quality Manifesto.


(And it must be said that certainly there is room for quality improvements in the classroom as well, and there was room even before NCLB. What passes for instruction in some classrooms horrifies me just as much as any quality train wreck I see in the assessment world. Two teachers at my daughters' school ROUTINELY play audio of the textbooks instead of teaching--this, in science, which I think we can all agree requires hands-on instruction; it's still bad in reading, but not quite so bad). One of these teachers also ROUTINELY spends twenty minutes or so of the fifty minute class period discussing her personal life to the captive audience of eighth-graders--they know all about her kids, her husband, her political views, her hobbies, her extended family, and her domestic habits. So do I. Unless someone is a friend, loved one, or crazy person celebrity, there's probably not much in her life you want to hear about for twenty minutes straight, and yet that is what this teacher subjects her students to instead of teaching them.)


UPDATE: Identified my book-readin' comrade by name. Thanks, Carmen!
UPDATE: Added a link.
UPDATE THE THIRD: Removed a link, anonymized an identity.

Monday, February 13, 2012

In Defense of Quality

If you cannot learn to love real art, at least learn to hate sham art.

This, from William Morris.

By “real art,” let’s say James Lesesne Wells, for example, and by “sham art,” let’s say Thomas Kinkade

I’d like to apply this sentiment to the work of writing, and more specifically, to the work of test content development. Quite frankly, I’m mystified (a phrase I borrow from an assistant district attorney with whom I used to work when I was on a very different career path, and who was in the habit of using this phrase to sharpen his tongue as he prepared to slice me up for having done something with which he disagreed) by not only the deep and devastatingly obvious diminution of quality in test content in the past few years, but also by the failure of people in this silo of the industry to recognize this trend.

Quite frankly, it breaks my heart.  As silly as it sounds. But when you love, you expose yourself to the risk of heartbreak. Again, I turn to William Morris, who said, “Give me love and work – these two only.” I’ve been doing this work for 19 years now; though I got into it thinking it was a temporary rope to keep me out of the quicksand until I found my magic circle niche place on solid ground, I think we can all agree it’s become a long-term relationship.

If this lack of quality trend were limited to newcomers to the business, we could propose that them entry-level young’uns [*sigh*] are poorly educated and ill equipped to express themselves except via texting, which you can certainly see in their editorial comments (which are lamentably rich in acronyms, emoticons, and which betray an unfortunate juvenile fondness for excess punctuation and using all caps in directions, which cannot help but set one’s back up, however patient one might be, and anyone who knows me knows that overly patient I be not).

But no, all we content dev folk – ELA, math, science, and social studies, not one is immune, no, not one – have noticed, and we do talk about it, and the conversation and all the various repetitions and iterations of the same conversation bore and horrify us so that we are reduced to shaking our heads and turning our attention to some vision of an oasis, such as the cocktail that awaits the end of the day.

Back in the day, when I worked at Great Big Huge Test Publishing Company, I went to a mandatory training on root cause analysis. We used the fishbone chart. As trainings go, it was all right. Certainly better than the one at which I was accused of not doing my work and letting my teammates pick up my slack because I failed to participate in the assembling of a puzzle, which failure actually had a lot to do with my abysmally poor spatial intelligence and equally poor vision (since corrected through the wonders of Lasik surgery) and little to do with my work ethic, which, as it happens, is about as Puritan as a work ethic might be. You can take the girl out of the working class, but you can’t take the working class out of the girl. But I digress.

If we performed a root cause analysis on the wreckage of Good Ship Quality, what would we find?

To answer that we’d have to go back to the beginning. When I started as a content editor, I was dedicated to one project. That project was my one, my only, my all in all. It was the same for my co-workers. That was the early 90s. In most states, large-scale tests were restricted to reading, writing, and math, and were administered at three or four grades (usually something like 4, 6, 8, and 10, or 5, 7, and 11).

Five years later, it was a whole different and bigger but not necessarily better ballgame. More states were testing more grades, and NCLB loomed on the horizon. As a supervisor, I was responsible for five projects. No one on my team was solely dedicated to any one project; each person, from editor to supervisor, worked on several.

I had a meeting with my manager that went like this:

Manager: [peering at her clipboard] All right, so you have State V, State W, State X, and State Y.
Me: And State Z.
Manager: Oh, I forgot about Z. Right. State Z. [scribbles a note on her clipboard]
Me: What is the order of priority?
Manager: [pause] They’re all priorities.
Me: With five states, mistakes are going to be made. It’s impossible to supervise five projects of this scale. Which state is going to be the mistake state?


Test publishing companies couldn’t handle the workload. Companies that had never done any testing smelled the money and jumped into the fray. All companies got hiring fever. By then, I was a content development manager hiring entry-level candidates at more than twice my starting salary as an editor (and did that ever sting, I tell you what).

But the equation for meeting a production deadline is

TIME + WORKERS = MEETING PRODUCTION DEADLINE

If you have less time, you need more workers. Fewer workers, you need more time. I am no math expert, but this equation I know.

Deadlines got more and more compressed, development cycles shrank, and everyone starting skipping steps. Real training gave way to on-the-job training, which really means sink-or-swim training. New-hires were handed the comprehensive binder containing lists of processes and procedures, which binders were relegated to shelves in cubicles because no one had time to read them. Early field tests were cast aside. Sometimes all field tests. There were fewer internal reviews. The few remaining reviews were performed by overworked and/or underexperienced staff—and you can actually determine which is which (and which is both) when you see the editorial feedback coming out of such reviews.[1]

Another significant factor may be a corollary to the Peter Principle. The most highly skilled, knowledgeable, and experienced line staff keep getting promoted to management, where they may be doing a fantastic job, but their spots are filled either by new hires or old hands who are left behind (how can I say this delicately? Their remaining behind may not always be by their own choice). Combine this with the absence of training, and it’s a chaos cocktail.

Not to mention the dependence on freelancers. Companies started laying people off and then rehiring them as subcontractors. For some, it’s a win-win—the company don’t have to pay your benefits, and you get to work at home in your pajamas—but it do mean there are a heck of a lot of people at their keyboards writing test questions who neither have experience in education nor in publishing, let alone assessment, which some of us choose to believe is both an art and a science.

There is value in enduring years of slogging through the entire publishing cycle from first draft through bluelines over and over and over again. There is value in having logged many hours in the company of small children struggling to read. There is value in meeting with what the industry calls the stakeholders—teachers, administrators, community leaders, DOE officials. There is value in stretching to accommodate the demands of the stakeholders. There is value in educating oneself about the history and practices of one’s profession. Those learn-to-play-the-piano-in-10-minutes books aside, there is no shortcut to attaining mastery in anything.

There are so many facets to what we do in assessment content development, and when one’s experience is restricted to one tiny mirrored triangle of the great big disco ball, well, that creates a problem because one hasn’t constructed a greater context which allows for greater meaning to inform and guide the work. When the work is simply writing questions for a paycheck and meaning goes out the window, the questions get lamer and lamer, by which I mean trivial, superficial, and plagued by error.

However, the purpose of identifying a problem is not to castigate wrong-doers, nor to enjoy that most basic human pleasure of being right, but to use such identification to find a solution.

The answers are probably as clear to you as they are to me:

  • 1.     Only hire content developers (freelance or in-house, I have no axe to grind here) who either have a proven track record of providing high-quality work or who have the capability (combination of education, writing skills, content area expertise, intelligence, creativity, and persnicketiness) to learn how to do the work well
  • 2.     Provide not only adequate but excellent training
  • 3.     Employ senior content development personnel [*ahem* not naming any names] to review items and provide specific instructional feedback to writers
  • 4.     Budget sufficient time and money for the given project



[1] Overworked but highly experienced people skim text, which forces their brain to fill in the gaps. Which means erroneous assumptions and conclusions drawn from limited evidence and resulting in unnecessary, ill-advised edits. Subtleties or fine distinctions are impossible to detect when skimming. Underexperienced people often restrict their scope of what’s acceptable to their own narrow band of direct experience, and then reject what lay in the outer darkness of their ignorance. This is bad enough on its own, but they will then assume a pedantic tone and lecture the writer for having written items that exceed the demands of the specifications.