RetractionWatch recently reported on the retraction of a paper by William Hart.
Richard Morey blogged in more detail about this case. According to the RetractionWatch report:
From this description I can only conclude that I am
that “scientist outside the lab.”
I’m writing this post to provide some context
for the Hart retraction. For one, inconsistent
is rather a euphemism for what transpired in what I’m about to describe. Second, this case did indeed involve a graduate student,
whom I shall refer to as "Fox."
Back to the beginning. I was a co-author on a registered replication report (RRR) involving one of Hart’s experiments. I described this project in a previous post. The bottom line is
that none of the experiments replicated the original finding and that there was
no meta-analytic effect.
Part of the RRR procedure is that original authors are invited to write a commentary on the replication report. The original commentary that was shared with
the replicators had three authors: the two original authors (Hart and
Albarricin) and Fox, who was the first author. A noteworthy aspect of the commentary
was that it contained experiments. This was surprising (to put
it mildly), given that one does not expect experiments in a commentary on a registered replication report,
especially when these experiments themselves are not preregistered, as was the case here. Moreover, these experiments deviated from the protocol that we had established with the original authors. A clear case of double standards, in other words.
Also noteworthy was that the authors were able to replicate their own effect. And not surprising was that the commentary painted us as replication bullies. But with fake data, as it turns out.
The authors were made to upload their data to the
Open Science Framework. I decided to take a look to see if I could explain the
discrepancies between the successful replications in the commentary and all the unsuccessful ones in the RRR. I first tried to reproduce the descriptive and inferential
statistics.
Immediately I discovered some discrepancies between what was reported in the commentary and what was in the data file, both in condition means and in p-values. What could explain these discrepancies?
I decided to delve deeper and suddenly noticed a sequence
of numbers, representing a subject’s responses, that was identical to a sequence
several rows below. A coincidence, perhaps? I scrolled to
the right where there was a column with verbal responses provided by the subjects,
describing their thoughts about the purpose of the experiment. Like the number
sequences, the two verbal responses were identical.
I then sorted the file by verbal responses. Lots of
duplications started popping up. Here is a sample.
In all, there were 73 duplicates in the set of 194 subjects.
This seemed quite alarming. After all, the experiment was run in the lab and
how does one come to think they ran 73 more subjects than they actually ran? In
the lab no less. It's a bit like running 25k and then saying afterwards "How bout them apples, I actually ran a marathon!" Also, given that the number of subjects was written out, it was clear
that the authors intended to communicate they had a sample of 194 and not 121 subjects. Also important was that the key effect
was no longer
significant when the duplicates were removed (p=.059).
The editors communicated our concerns to the authors and pretty
soon we received word that the authors had “worked night-and-day” to correct the
errors. There was some urgency because the issue in which the RRR would appear
was going to press. We were reassured
that the corrected data still showed the effect such that the conclusions of
the commentary (“you guys are replication bullies”) remained unaltered and the
commentary could be included in the issue.
Because I already knew that the key analysis was not
significant after removal of the duplicates, I was curious how significance was
reached in this new version. The authors had helpfully posted a “note on file
replacement”:
The first thing that struck me was that the note mentioned
69 duplicates whereas there were 73 in the original file. Also puzzling was the
surprise appearance of 7 new subjects. I guess it pays to have a strong bullpen. With this new data collage, the p-value
for the key effect was p=.028 (or .03).
A close comparison of the old and new data yields a
different picture, though. The most important difference was that not 7 but 12
new subjects were added. In addition, for one duplicate both versions were
removed. Renowned data sleuth Nick Brown analyzed these data separately from me
and came up with the same numbers.
So history repeated itself here. The description of the data
did not match the data and the “effect” was again significant just below .05 after the mixing-and-matching process.
There was much upheaval after this latest discovery, involving all
of the authors of the replication project, the editors, and the commenters. I suspect that had we all been in the same room there would have been a brawl.
The
upshot of all this commotion was that this version of the commentary was withdrawn. The issue of Perspectives on Psychological Science went
to press with the RRR but without the commentary. In a subsequent issue, a commentary appeared
with Hart as its sole author and without the new "data."
Who was responsible for this data debacle? After our
discovery of the initial data duplication, we received an email from Fox
stating that "Fox and Fox alone" was responsible for the mistakes. This sounded
overly legalistic to me at the time and I’m still not sure what to make of it.
The process of data manipulation described here appears to
be one of mixing-and-matching. The sample is a collage consisting of data that can be
added, deleted, and duplicated at will until a p-value of slightly below .05 (p
= .03 seems popular in Hart’s papers) is reached.
I wonder if the data in the additional papers
by Hart that apparently are going to be retracted are produced by the same foxy mixing-and-matching process. I hope the University of Alabama will publish the results of its investigation. The field needs openness.
Thanks for this excellent post and thanks for uncovering this blatant case of fraud!
BeantwoordenVerwijderen"Fox" has evidently had a long and fraudulent career. According to Hart, it was Fox who faked the data for a 2013 paper (sole author Hart) which is now going to be retracted. But that paper was submitted on 22nd December 2011. Hmm.
Yes, that's one of the things that's odd about this situation. I wonder if Fox started out as an undergraduate student, was discovered to have "flair" and was then recruited into the grad program. This is all speculation on my part, though.
VerwijderenHas it been ruled out that Fox is not just taking the blame? If you know the identify of this person and if you can find a CV online, you should be able to find out if Fox was indeed a student at the same university in 2011. No acknowledgements were made to anybody else for data collection help either. At the very least, Hart is unethical in not acknowledging the contributions of students to his papers.
VerwijderenIt is also amazing that the methods section in that paper (linked to by Neuroskeptic) doesn't mention what university the participants came from or what IRB approved the research.
Excerpts from the paper:
"To measure happiness, I had participants rate their current level of happiness, using a scale from 0 (unhappy) to 10 (happy), and their satisfaction with life, using a scale from 0 (unsatisfied) to 10 (satisfied; Strack et al., 1985); participants were told that these ratings were of interest to a university panel."
"Subsequently, I told participants that I was interested in their experience during the task."
So I don't get how Fox and Fox alone could be responsible for any and all mistakes.
I'm not sure it has been ruled out that he's just taking the blame. As I said, I really don't know what to make of it.
VerwijderenYes, I noticed the lack of specificity as well. In the original commentary we were blamed for not having realized we should have run politically conservative subjects. As with the article you're referring to, the article we were dealing with didn't even specify where the subjects were from.
Are we certain Fox is a real person?
BeantwoordenVerwijderenI'm reasonably certain but not 100%.
Verwijderen