Doorgaan naar hoofdcontent

The Undead Findings are Among Us

A few months ago, I was asked to review a manuscript on social-behavioral priming. There were many things to be surprised about in this manuscript, not the least of which was that it cited several papers by Diederik Stapel. These papers had already been retracted, of course, which I duly mentioned in my review.  It has been said that psychology is A Vast Graveyard of Undead Theories. These post-retraction Stapel citations suggests that this cemetery might be haunted by various undead findings (actually, if they were fabricated, they weren’t really alive in the first place but let's not split semantic hairs).

There are several reasons why someone might cite a retracted paper. The most obvious reason is that they don’t know the paper has been retracted. Although the word RETRACTED is splashed across the first page of the journal version of the article, it will likely be absent on other versions that can still be found on the internet. Researchers working with such a version, might be forgiven for being unaware of the retraction.

But citing Stapel??? It is not like the press, bloggers, colleagues at the water cooler, the guy next to you on the plane, and not to mention Retractionwatch haven’t been all over this case!

A second reason for citing a retracted article is obviously to point out the very fact that that paper has been retracted. Nevertheless, a large proportion of citations to retracted papers are still favorable, just like the Stapel citations.

"Don't expect any help from us."
Does this imply that retracted findings have a lasting pollutive effect on our thinking? A recent study suggests they do. The Austrian researcher Tobias Greitemeyer presented subjects* with findings from a now retracted study by Lawrence Sanna (remember him?). Sanna reported that elevated height (e.g., riding up escalators) led to more prosocial (prosocial being the antonym of antisocial) behavior than lowered height (e.g., riding down escalators). The findings were found to be fabricated, which is why the paper was retracted.

Greitemeyer formed three groups of subjects. He told the first two groups about the Sanna study but not the third group. All subjects then rated the relationship between physical height and prosocial behavior.

Next the subjects wrote down all their ideas about this relationship. At the end of the experiment, half of the subjects who had received the summary, the debriefing condition, learned that the article had been retracted because of fabricated data and that there was no scientific evidence for the relation between height and prosocial behavior. Subjects in the no-debriefing and the control condition did not receive this information. Finally, all three groups of subjects responded to the same two items about height and prosocial behavior that they had responded to earlier.

As you might expect, the subjects in the debriefing and no-debriefing conditions made stronger estimates about the relation on the initial test than did those in the control condition. More interesting are the responses on the second test, after the debriefing condition (but not the other two conditions) had heard about the retraction. On this test the subjects in the no-debriefing condition had the highest score.  But the crucial finding was that the debriefing condition still exhibited a stronger belief in the relation between height and prosocial behavior than did the control condition. So, the debriefing lowered belief in the relation but not sufficiently.

Greitemeyer provides an explanation for these effects. It turns out that the number of explanations that subjects gave for the relationship between height and prosocial behavior correlated significantly with post-debriefing beliefs. A subsequent analysis showed that belief perseverance in the debriefing condition appeared to be attributable to causal explanations. So retraction does not lead to forgetting and that this cognitive inertia occurs because people have generated explanations of the purported effect, which presumably lead to a more entrenched memory representation of the effect. 

But we need to be cautious in interpreting these results. First, it is only one experiment. A direct replication of these findings (plus a meta-analysis that includes the two experiments) seems in order. Second, some of the effects are rather small, particularly the important contrast between the control and the no-debriefing condition. In other words, this study is a perfect candidate for replicating up.

After a successful direct replication, conceptual replications would also be informative. As Greitemeyer himself notes, a limitation of this study is that the subjects only read a summary of the results and not the actual paper. Another is that the subjects were psychology students rather than active researchers. Having researchers read the entire paper might produce a stronger perseverance effect, as the entire paper likely provides more opportunities to generate explanations and the researchers are presumably more capable of generating such explanations than the students in the original experiment were. On the other hand, researchers might be more sensitive to retraction information than students, which would lead us to expect a smaller perseverance effect.

Greitemeyer makes another interesting point. The relation between height and prosocial behavior seems implausible to begin with. If an effect has some initial plausibility (e.g., meat eaters are jerks) retraction might not go very far in reducing belief in the relation.

So if Greitemeyer’s findings are to be believed, a retraction is no safeguard against undead findings. The wights are among us...

*The article is unfortunately paywalled


  1. At least Stapel's withdrawn papers have red ink on them now. In the case of Fredrickson and Losada (2005), which as of today has 35 citations in 2014 according to Google Scholar, the word "Retracted" is not splashed (or even mentioned) *anywhere*, despite the authors (yes, both of them!) having formally withdrawn "the modeling element" --- which is the *entire point* of the article, taking up about two-thirds of its word count and printed area. In order to discover this fact, the reader has to scour the darker corners of American Psychologist, just on the off chance that the article has been withdrawn; last time I checked, the PDF had not been corrected or amended in any way.

    Meanwhile, attempts to get Mathematical and Computer Modelling to retract Losada (1999) are proving difficult, as the journal recently closed its doors and so there is no Editorial Board around to take the decision. This looks like a great future loophole for bad scientists: identify a journal that is about to close and publish your article in the last-ever issue. Of course, since Losada willingly co-authored the withdrawal of the model from Fredrickson and Losada (2005), one would logically expect him to retract this article too, but I'm not holding my breath for that...

  2. Peculiar. Without the modeling component, that paper is an empty husk. Why not retract it entirely? To be continued, I assume...

  3. We have to be a bit careful here. The post-debriefing difference between the debriefing group and the control group was only just significant (p=.045). Based on this, and the means (+0.34 and -0.29), my guess is that neither group differed significantly from zero (I didn't see this test reported in the paper, but I only skimmed it quickly). If I'm right, then neither group believed it likely that there was a true relationship at all, it's just that one erred on the side of finding a negative relationship more plausible and the other erred on the side of finding a positive relationship more plausible.

    There's an interesting comment in the NY Times interview with Diederik Stapel ( where he says that he only made up data in support of results which he, and the rest of the field, expected to be true ("I always checked -- this may be by a cunning manipulative mind -- that the experiment was reasonable, that it followed from the research that had come before, that it was just this extra step that everybody was waiting for.")

    So in a sense, it's not unreasonable for people to suppose that a given relationship is marginally more likely to be positive than negative if it's known to have been regarded as plausible by a (at least for a time) successful fraudster, and the rest of the social psychology community.

    1. Yes, that's why I said we need to be cautious: p=.045 and N=155. A Bayesian analysis would probably not come down on the side of H1.

      The Stapel quote is indeed very relevant. I hinted at this by my reference to the "meat eaters are jerks" study.

    2. Using Jeff Rouder's website, the default Bayesian t-test yields a Bayes factor of about 1, which suggests the statistical evidence is almost perfectly ambiguous. I have not done a sensitivity analysis, but I doubt that any reasonable prior distribution will result in a very different message. So indeed, this finding is in need of replication.

    3. Thanks, that's what I figured. I hope someone will step up to the replication plate.

  4. Interesting conceptual/direct replication points in your post, as this is very similar (conceptually!) to Ross & Lepper's old work on debriefing and belief perseverance. My memory (correct me if I'm wrong) is that they did a pretty fair amount of work on the topic back then, so "First, it is only one experiment..." is true regarding the specific finding about retraction of this particular relationship, but not true the general idea. (Then again, I don't remember how much direct replication was ever done on those earlier ideas, or even the quality of the data for that matter.)

    1. You're right. My point was that this particular study needs a direct replication. I know there is work on belief perseverance. It is covered well in Greitemeyer's article.

    2. Yeah... I wasn't implying that you didn't know about that work :)


Een reactie posten