This finding is known as the verbal overshadowing effect and was discovered by Jonathan Schooler. In the experiment that is of interest here, he and his co-author, Tonya Engstler-Schooler, found that verbally describing the perpetrator led to a 25% accuracy decrease in identifying him. This is a sizeable difference with practical implications. Based on these findings, we’d be right to tell Benson and Tutuola to lay off interviewing you until after the lineup identification.
Here is how the experiment worked.
Subjects first watched a 44 second video clip of a (staged) bank robbery. Then they performed a filler task for 20 minutes, after which their either wrote down a description of the robber (experimental condition) or listed names of US states and their capitals (control condition). After 5 minutes, they performed the lineup identification task.
How reliable is the verbal-overshadowing effect? That is the question that concerns us here. A 25% drop in accuracy seems considerable. Schooler himself observed subsequent research yielded progressively smaller effects, something he referred to as “the decline effect.” This clever move created a win-win situation for him. If the original finding replicates, the verbal overshadowing hypothesis is supported. If it doesn’t, then the decline effect hypothesis is supported.
irst massive Registered Registration Report under the direction of Dan Simons (Alex Holcombe is leading the charge on the second project) that was just published. Thirty-one labs were involved in direct replications of the verbal overshadowing experiment I just described. Our lab was one of the 31. Due to the large number of participating labs and the laws of the alphabet, my curriculum vitae now boasts an article on which I am 92nd author.
Due to an error in the protocol, the initial replication attempt had the description task and a filler task in the wrong order before the line-up task, which made the first set of replications, RRR1, a fairly direct replication of Schooler’s Experiment 4 rather than, as was the plan, his Experiment 1. A second set of experiments, RRR2, was performed to replicate Schooler’s Experiment 1. You see the alternative ordering here.
In Experiment 4, Schooler found that subjects in the verbal description condition were 22% less accurate than those in the control condition. A meta-analysis of the RRR1 experiments yielded a considerably smaller, but still significant, 4% deficit. Of note is that all the replication studies found a smaller effect than the original study but that study was also less precise due to having a smaller sample size.
Before I tell you about the results of the replication experiments I have a confession to make. I have always considered the concept of verbal overshadowing plausible, even though I might have a somewhat different explanation for it than Schooler (more about this maybe in a later post), but I thought the experiment we were going to replicate was rather weak. I had no confidence that we would find the effect. And indeed, in our lab, we did not obtain the effect. You might argue that this null effect was caused by the contagious skepticism I must have been oozing. But I did not run the experiment. In fact, I did not even interact about the experiment with the research assistant who ran it (no wonder I’m 92nd author on the paper!). So the experiment was well-insulated from my skepticism.
Here are the main points I take a away from this massive replication effort.
1. Our intuitions about effects may not be as good as we think. My intuitions were wrong because a meta-analysis of all the experiments finds strong support for the effect. Maybe I’m just a particularly ill-calibrated individual or an overly pessimistic worrywart but I doubt it. For one, I was right about our own experiment, which didn’t find the effect. At the same time, I was clearly wrong about the overall effect. This brings me to the second point.
2. One experiment does not an effect make (or break). This goes both for the original experiment, which did find a big effect, as for our replication attempt (and 30 others). One experiment that shows an effect doesn’t mean much, and neither does one unsuccessful replication. We already knew this, of course, but the RRR drives this point home nicely.
3. RRRs are very useful for estimating effect sizes without having to worry about publication bias. But it should be noted that they are very costly. Using 31 labs seems was probably overkill, although it was nice to see all the enthusiasm for a replication project.
4. More power is better. As the article notes about the smaller effect in RRR1: “In fact, all of the confidence intervals for the individual replications in RRR1 included 0. Had we simply tallied the number of studies providing clear evidence for an effect […], we would have concluded in favor of a robust failure to replicate—a misleading conclusion. Moreover, our understanding of the size of the effect would not have improved."
5. Replicating an effect against your expectations is a joyous experience. This sounds kind of sappy but it’s an accurate description of my feelings when I was told by Dan Simons about the outcome of the meta-analyses. Maybe I was biased because I liked the notion of verbal overshadowing but it is rewarding to see an effect materialize in a meta-analysis. It's a nice example of “replicating up.”
Where do we go from here? Now that we have a handle on the effect, it would be useful to perform coordinated and preregistered conceptual replications (using different stimuli, different situations, different tasks). I'd be happy to think along with anyone interested in such a project.
Update September 24, 2014. The post is the topic of a discussion on Reddit.