Early in the morning a research assistant is
preparing for her first subject. She is a little nervous and quietly rehearses
the instructions she is about to give. In walks the subject. His gait is a little
unstable and it sure looks like he hasn’t had a shower in a long time. His eyes
have trouble focusing as she starts giving the instruction and he smells funny.
He is drunk.
What is she supposed to do? Should she bar him from the
experiment? But he did show up on time and he will be docked course credit if
she refuses to let him participate. She decides to go ahead with the experiment. Should she
tell her supervisor that their first subject was drunk?
That’s the easy question. The answer is yes. But what should
the supervisor do? That’s the more difficult question. Dan Ariely describes
an experiment that was run in his lab. He discovered there was an inebriated subject
in one of the two conditions of the experiment. There was no significant difference between the
conditions. No difference unless he threw out the data from the subject who was
ten sheets to the wind. This subject performed badly on the task but happened
to be in the condition that Ariely had predicted would outperform
the other one. He basically dragged down his team.
So initially Ariely threw out the subject’s booze-based data. But then
he and his students had second thoughts. Suppose, they reasoned, that the
subject had been in the condition that was predicted to do poorly on the task.
Then the drunk’s data would have greatly enhanced the effect. He would have been his team's MVP! Ariely and his students probably
would not have discarded the data. The group decided to rerun the
experiment.
Ariely didn’t say what happened to the original experiment. In
line with the emerging view
on psychological experimentation that I have been describing in previous posts,
the ideal solution would be to (1) keep the original experiment, (2)
throw out the dipsomaniac’s data, (3) rerun the experiment, now with an exclusion
rule for intoxicated subjects, and (4) report both experiments.
Yes, you did slightly “torture” your data in the first
experiment but that’s okay because it’s only an exploratory experiment. The second
one is confirmatory. By including both you’re not wasting any data AND you have
a replication. By also posting the data, others can see the effect of
including or excluding the troublesome subject.
There are two other points here. The first one is that if
your effect hinges on one subject, you probably don’t have enough power. My
hunch is that that there are many such studies in the literature. With larger
samples, a single subject doesn’t make the difference.
The other point is that it might be good if the field
converges on a number of basic subject-exclusion rules. There already seems to be
some sort of implicit consensus but it might be good to make this explicit. If
my experience is par for the course, most experimenters will have had to deal with
subjects who were drunk, stoned (contrary to public perception, we have had many more of those in the United States than in the Netherlands), ill, distraught, preoccupied
with an exam, in physical pain, numb from recent dental work, and plain uncooperative.
There are also subject-exclusion conventions that are based on the
data. Data that deviate strongly from the average (for example more than three
standard deviations) or that are above or below a fixed threshold are often
omitted.
Including all of these rules in each and every paper would
seem a tad excessive but perhaps there should be a centralized checklist that
researchers can refer to in a pre-registration of their experiment. I’d be interested to hear comments on what this list
should contain—if people think this is a useful idea, that is.
Often subjects are excluded because they “fail to follow
instructions.” It is not always clear what is meant by this. It seems an easy
way to brush inconvenient data under the rug. On the other hand, subjects are surprisingly
creative at not following instructions. I could fill several posts with
examples.
I’ll just give one. The very first subject I ever ran.
The task was to read sentences from a computer screen and I was measuring their
reading times. The subject, a law student, came out of the sound-attenuated booth
and proudly announced that he had read each sentence twice. My first instinct
was to raise my arms ostentatiously and yell: “You fool! I’m measuring reading
times! You were instructed to read normally!”
But then I realized that reading “normally” for a law student probably meant
trying to memorize every word. So the subject had followed the instructions. It is just that his interpretation of them differed from mine. I didn't throw out the subject's data.
And then there are examples of subjects that defy classification. We once
had an experiment with a practice task, in which subjects judged pairs of words and decided if they were antonyms. This was just to make the subjects familiar with the task of pressing yes and no keys in response to words. One of my graduate students had a bewildering interaction with a subject. I don’t recall the details of the dialogue but here
is my I rendition of it.
EXPERIMENTER: In this task you are going to judge antonyms. Antonyms
are words that have opposite meanings, like high-low, warm-cold, young…
SUBJECT: I get it! Like cat and dog.
EXPERIMENTER: (you're kidding, right?) No, I mean
opposites, like deep-shallow, hard-soft…
SUBJECT: Yes, that’s what I’m saying, like cat and dog.
EXPERIMENTER: (what have you been
smoking?) Maybe I didn’t explain it properly. I mean that high is exactly
what low is not. When something is not at all low, it is high (which is probably what you are right now).
SUBJECT: Yes that’s
exactly it. If something is not a cat, it is probably a dog.
EXPERIMENTER: Yes (you clown)
but if it’s not a cat, it can be a million other things as well. It could be a
hamster or a cow or even a garbage truck or an unsolved math problem.
SUBJECT: That doesn’t make any sense. What do garbage trucks have to do
with cats? Not as much as dogs, that’s for sure
EXPERIMENTER: (I’m going to kill
you and then I’m going to kill you again) Let’s start with the
experiment.
If we had to cover cases like this, there would be no end to the list. But I think it is feasible to generate a list of the most common exclusion rules.
Maybe it already exists. If so, I’d love to hear about it. If not, it might be useful to consider which rules should go on the list.
Well... after hearing of your blog in last weeks lecture, I decided taking a look wouldn't hurt. Not very little was my suprise when I started reading: clear, informational, appealing topics, with just enough humorous flavour!
BeantwoordenVerwijderenKeep them comming!
Thanks! There will be many more to follow. Some more humorous, some more serious.
BeantwoordenVerwijderen