Wednesday, January 31, 2018

A Replication with a Wrinkle

A number of years ago, my colleagues Peter Verkoeijen, Katinka Dijkstra, several undergraduate students, and I conducted a replication of Experiment 5 of Kidd & Castano (2013). In that study, published in Science, participants were exposed to an excerpt from either literary fiction or from non-literary fiction.

Kidd and Castano hypothesized that brief exposure to literary fiction as opposed to non-literary fiction would enhance empathy in people because of the greater psychological finesse in literary novels than in non-literary novels. Anyone who has read, say, Proust as well as Michael Crichton will probably intuit what Kidd and Castano were getting at.

Their results showed indeed that people who had been briefly exposed to the literary excerpt showed more empathy in Theory of Mind (ToM) tests than participants who had been briefly exposed to the non-literary excerpt.

Because the study touches on some of our own interests, text comprehension, literature, empathy and because of a number of reasons detailed in the article, we decided to replicate one of Kidd & Castano’s experiments, namely their Experiment 5. Unlike Kidd and Castano, we found no significant effect of text condition on ToM. We wrote that study up for publication in the Dutch journal De Psycholoog, a journal targeted at a broad audience of practitioners and scientists.

Because researchers from other countries kept asking us about the results of our replication attempt, we decided to make them more broadly available by writing an English version of the article with a more detailed methods and results section than was possible in the Dutch journal. This work was spearheaded by first author Iris van Kuijk, who was an undergraduate student when the study was conducted. A preprint of the article can be found here. An attentive reader who is familiar with the Dutch version and now reads the English version will be surprised. In the Dutch version the effect was not replicated but in the English version it was. What gives?

And this brings us to the wrinkle mentioned in the title. The experiment relies on subjects having read the excerpt. However, as any psychologist knows, there are always people who don’t follow instructions. To pinpoint such subjects and later exclude their data, it is useful to know whether they’ve actually read the texts. In both experiments, reading times per excerpt were collected.

We originally reasoned that it would be impossible for someone to read and understand a page in under 30 seconds. So we excluded subjects who had one or more reading times < 30 seconds per page. This ensured that our sample included subjects who had at least spent a reasonable amount of time on each excerpt. This would give the manipulation, reading a literary vs. non-literary excerpt optimum chance to work.

Upon reanalyzing the data for the English version, my co-authors noticed that Kidd and Castano had used a different criterion for excluding outliers. They had used a criterion that was less stringent than ours. They had excluded subjects whose average reading times were < 30 seconds. This potentially includes subjects who may have had long reading times for one page but may have skimmed another.

Our original approach ensured that people had at least spent a sufficient amount of time on each page. This still does not guarantee that they actually comprehended the excerpts, of course. For this, it would have been better to include comprehension questions, such that subjects with low comprehension scores could have been excluded, as is common in text comprehension research. 

Because we intended to conduct a direct replication, we decided to adopt the exclusion used by Kidd and Castano, even though we thought our own was better. And then something surprising happened: the effect appeared!

What to make of this? On the one hand, you could say that our direct replication reproduced the original effect (very closely indeed). On the other hand, we cannot come up with a theoretically sound reason why the effect would appear with a less-stringent exclusion criterion, which gives the manipulation less chance to impact ToM responses, and disappears with a more stringent criterion.

Nevertheless, if we want to be true to the doctrine of direct replication, which we do, then we should count this as a replication of the original effect but with a qualification. As we say in the paper:
“Taken together, it seems that replicating the results of Kidd and Castano (2013) hinges on choosing a particular set of exclusion criteria that a priori seem not better than alternatives. In fact, […] one could argue that a more stringent criterion regarding reading times (i.e., smaller than 30s per page rather than smaller than 30s per page on average) is to be preferred because participants who spent less than 30 seconds on a page did not adhere to the task instruction of reading the entire text carefully.”
The article also includes a mini meta-analysis of four studies, including the original study and our replication. The meta-analytic effect is not significant but there is significant heterogeneity among the studies.

In other words, there still are some wrinkles to be ironed out.