Thursday, May 29, 2014

My Take on Replication

There are quite a few comments on my previous post already, both on this blog and elsewhere. That post was my attempt to make sense of the discussion that all of a sudden dominated my Twitter feed (I’d been offline for several days). Emotions were runing high and invective was flying left and right. I wasn’t sure what the cause of this fracas was and tried to make sense of where people were coming from and suggest a way forward. 

Many of the phrases in the post that I used to characterize the extremes of the replication continuum are paraphrases of what I encountered online rather than figments of my own imagination. What always seems to happen when you write about extremes, though, is that people rush in to declare themselves moderates. I appreciate this. I’m a moderate myself. But if we were all moderates, then the debate wouldn’t have spiralled out of control. And it was this derailment of the conversation that I was trying to understand.

But before (or more likely after) someone mistakes one of the extreme positions described in the previous post for my own let me state explicitly how I view replication. It's rather boring.
  • Replication is by no means the solution to all of our problems. I don’t know if anyone seriously believes it is.
  • Replication attempts should not be used or construed as personal attacks. I have said this in my very first post and I'm sticking with it.
  • A failed replication does not mean the original author did something wrong. In fact, a single failed replication doesn’t mean much, period. Just like one original experiment doesn’t mean much. A failed replication is just a data point in a meta-analysis, though typically one with a little more weight than the original study (because of the larger N). The more replication attempts the better.
  • There are various reasons why people are involved in replication projects. Some people distrust certain findings (sometimes outside their own area) and set out to investigate. This is a totally legitimate reason. In the past year or so I have learned that that I’m personally more comfortable with trying to replicate findings from theories that I do find plausible but that perhaps don’t have enough support yet. I call this replicating up. Needless to say, this can still result in a replication failure (but at least I’m rooting for the effect). And then there are replication efforts where people are not necessarily invested in a result, such as the reproducibility project and the registered replication projects. Maybe this is the way of the future. Another option is adversarial replication. 
  • Direct vs. conceptual replication is a false dichotomy. Both are necessary but neither is sufficient. Hal Pashler and colleagues have made it clear why conceptual replication by itself is not sufficient. It’s biased against the Null. If you find an effect you'll conclude the effect has replicated. If you don’t, you’ll probably conclude that you were measuring a different construct after all (I’m sure I must have fallen prey to this fallacy at one point or another). Direct replications have the opposite problem. Even if you replicate a finding many times over, it might be that what you’re replicating is, in fact, an artifact. You’ll only find out if you conduct a conceptual replication, for example with a slightly different stimulus set. I wrote about the reliability and validity of replications earlier, which resulted in an interesting (or so we thought) “diablog” with Dan Simons on this topic (see also here, here, and here).
  • Performing replications is not something people should be doing exclusively (at least, I’d recommend against it). However, it would be good if everyone were involved in doing some of the work. Performing replications is a service to the field. We all live in the same building and it is not as solid as we once thought. Some even say it’s on fire.




Wednesday, May 28, 2014

Trying to Understand both Sides of the Replication Discussion

I missed most of the recent discussion on replication because I’m on vacation. However, the weather’s not very inviting this morning in southern Spain, so I thought I’d try to catch up a bit on the fracas, and try to see where both sides are coming from. My current environment induces me to take a few steps back from it all. Let’s see where this goes. Rather than helping the discussion move forward, I might, in fact, inadvertently succeed in offending everyone involved.

Basically, the discussion is between what I’ll call the Replicators and the Replication Critics Reactionaries. I realize that the Replicators care about more than just replication. The Reactionaries are reactionary in the sense that they Critics are opposing the replication movement. The Replicators and the Reactionaries Critics are the endpoints of what probably is close to a continuum.

Who are the Replicators? As best as I can tell, they are a ragtag group of (1) mid-career-to-senior methodologists, (2) early-to-mid-career social psychologists and social-psychologists-turned-methodologists, (3) mid-career-to-senior psychologists from other areas than social psychology.

Who are the Reactionaries Critics? As best as I can tell they are mid-career-to-senior social psychologists. (If there are Reactionaries Critics who don’t fit this bill, I’d like to hear who they are, so I can expand the category.)

What motivates the Replicators? They are primarily motivated by a concern about the state of our field. However, purely looking at the composition of the group, it is possible that career advancement is at least a small part of the motivation as well. The Replicators are generally not the senior people in their field (social psychology) or are in an area (methodology) where they previously did not have the level of exposure (who reads the Journal of Mathematical Psychology?) that they’re enjoying now. And maybe the people from other areas, who seem to have little extra to gain from taking part in the discussion, just enjoy making snarky comments once in a while.

What motivates the Reactionaries Critics? It is clear that senior social psychologists are often the target of high-profile replication efforts. They are also rattled by recent (alleged and proven) fraud cases among their ranks (Stapel, Sanna, Smeesters, Förster). So it is not surprising that they feel they are under attack and react rather defensively. Given the composition of the group, they have something to lose. They have a reputation. Not only that, they have always been able to publish in high-profile outlets and have received a great deal of positive media attention. All of this is threatened by the replication movement. But there is something else as well, the Reactionaries Critics value creativity in research, maybe above anything else.

How do Replicators view original studies? They view them as public property. The data, the procedure, everything should be available to anyone who wants to scrutinize it. This leads them to be suspicious of anyone who doesn’t want to share.

How do Reactionaries Critics view original studies? They seem to (implicitly) view them a bit like works of art. They are the author’s intellectual property and the process that has led to the results requires a certain artistry that one has to be “initiated in” and cannot easily be verbalized.

How do Replicators view replications? There is no single view. Some replication attempts are clearly efforts to show that particular (high-profile) findings are not reproducible. Other attempts are motivated because someone initially liked a finding and wanted to build on it but was unable to do so. Yet other replication efforts are conducted to examine the reproducibility of the research in an entire area. And there are other motivations as well. The bottom line, however, is that Replicators view replicability as an essential part of science.

How do Reactionaries Critics view replications? Given their emphasis on creativity, they are likely to have a low opinion of replications, which are, by definition, uncreative. Furthermore, because the process that has led to the published results cannot be verbalized easily in their view, replications are by definition flawed because there is always some je-ne-sais-quoi missing.

How do Replicators view Reactionaries Critics? Reactionaries Critics are apparently against open science and therefore probably have something to hide.

How do Reactionaries Critics view Replicators? A good researcher is creative. Replications are, by definition, uncreative, ergo replicators are unimaginative third-rate researchers who are only using replication to try to advance their own careers.

Of course these are caricatures (except in some very prominent cases). My take is that I understand why some Reactionaries feel they are under siege and that it is unfair that the spectre of Stapel is frequently raised when their research is involved. I agree that part of being a good researcher is being creative. However, the most important part of the job is to produce knowledge (which has to be based on reproducible findings). I agree that someone who only does replications, while useful, is not the most impressive researcher. On the other hand, I know that Replicators do their own original and creative research in addition to performing replications (and I see no reasons why Reactionaries couldn't do the same). There are no fulltime replicators outside of Star Trek.

It won’t be a surprise to the readers of this blog that I’m on the side of the Replicators. I think the EXPERIMENT-IS-WORK-OF-ART metaphor is untenable and at odds with what science is all about, which is openness and reproducibility, or EXPERIMENT-IS-PUBLIC-PROPERTY (I’m going all Lakoff on you today). Having said this, my sense is that the notion of replication conflicts with the Reactionaries’ Critics' (implicit) ideas about conducting experiments. To bring the Replicators and Reactionaries Critics closer together it might be useful to have a discussion about what are experiments? and what are experiments for? For now, it would help the discussion if members of both groups abandoned the useless REPLICATION-IS-TREBUCHET metaphor and instead adopt the, admittedly less dramatic, REPLICATION-IS-STRUCTURAL-INTEGRITY-CHECK metaphor (which I tried to promote in my very first post).

“Our house is on fire!” exclaimed E.J. Wagenmakers recently on Facebook. In a similar vein, but with less theatrical flair, I’d put it like this: “our foundation is not as sturdy as we might have thought. Everyone, let’s check!”


Now back to the pool.


Wednesday, May 7, 2014

Are we Intuitively Cooperative (or are we Moving the Goalposts)?

Are we an intuitively cooperative species? A study that was published a few years ago in Nature suggests that indeed our initial inclination is to cooperate with others. We are only selfish if we are allowed to reflect.

How did the researchers obtain these (perhaps counterintuitive) results? Subjects were given an amount of money and had to decide how much of this money, if any, they wanted to contribute to a common project. The subjects were told that they collaborated on this project with three other unknown players whose contributions were not known. They were told that each of the four players received a bonus that was calculated as follows: (additional money – own contribution) + 2*(sum of the contributions)/4.

So you get the highest personal payoff by being selfish and contributing nothing to the common good, regardless of the total contribution of the other three players. A random half of the subjects were required to make a decision on the amount of their contribution within 10 seconds, whereas the other half of the subjects had to think and reflect at least 10 seconds before making their contribution.

The experiments showed an intuitive-cooperation effect. The mean contribution was significantly larger in the intuition condition than in the reflection condition. Hence the conclusion that we are selfish when given the opportunity to deliberate but cooperative when responding intuitively.

Enter my colleagues Peter Verkoeijen and Samantha Bouwmeester. (I wrote about another study by them in a previous post. Basically, the story is this. I have to walk past their office several times a day on my way to the coffee machine and when they have a paper coming out they won’t let me pass unless I promise to write a blog post about them.) They were surprised about these findings and decided to replicate them. They conducted several experiments but found no support for the intuitive collaboration scheme.

What did they do find? First of all, it turned out that only 10% of the subjects understood the payoff scheme. (Did you understand it right away?) This makes an interpretation of the original findings difficult. How can we say anything meaningful when the vast majority of subjects misunderstand the experiment?

Wait a minute! you might say. Perhaps the original study was run with a different subject pool. This is not the case however. One of the two original experiments that found the effect was run on Mechanical Turk. The replication attempts by my colleagues were also run on Mechanical Turk.

Verkoeijen and Bouwmeester were unable to find evidence for intituive cooperation in several experiments even ones that were very close to the original ones. An initial version of their manuscript was reviewed and an anonymous reviewer pointed out that the authors of the original paper, David Rand and his colleagues, in the meantime coincidentally had conducted studies in which they were also unable to replicate their own finding.

Rand and his colleagues had an interesting explanation for this. Mechanical Turk subjects have become familiar with this type of experiment and now will no longer act naively. The entire pool of subjects is now contaminated. There is no hope of finding the intuitive cooperation effect ever again in that crowdsourcing version of Chernobyl. Fortunately, the effect is still there if naïve subjects are used because the effect is moderated by naïveté.

To address the Chernobyl criticism, my colleagues conducted additional experiments. However, they found no evidence for the newfangled naïveté hypothesis. Turkers who classified themselves as not having participated in public-goods experiments before (they were told prior participation would not preclude them from getting paid this time around as well) showed no intuitive cooperation effect.

An anonymous reviewer of the second version of Verkoeijen and Bouwmeester’s manuscript moved the goalpost even a little further. The reviewer (was it the same one as before?) claimed that it is likely that the Turkers lied about having no experience with the experiment. Not only are the Turkers a heavily polluted bunch, they are also inveterate liars.

So in addition to the naïveté hypothesis, we now have the mendacity hypothesis. Such a line of reasoning opens the door to non-falsifiability, of course. Whenever you find the effect, the subjects must have been naïve and when you don’t they must have been lying about having no experience. The editor at PloS ONE  had the good sense not to let this concern block publication of Verkoeijen and Bouwmeester’s article.

The article includes a meta-analysis of the reported experiments. This analysis produced no evidence for the intuitive cooperation hypothesis. In fact, the aggregate effect is going in the opposite direction. In addition, there are several other unsuccessful replications of the intuitive cooperation effect performed by a Swedish group.



It looks like the discussion on intuitive cooperation has reached an impasse with some initial experiments by one group showing an effect while subsequent experiments from several groups have produced nonreplications. Where do we go from here?

Peter Verkoeijen and Samantha Bouwmeester have initiated a Registered Replication Report with Perspectives of Psychological Science. A number of labs will independently test the intuitive cooperation hypothesis according to a strict protocol to be developed in collaboration with the original authors. I cannot think of a better way to resolve the discussion and stop the goalposts from moving. And what's more important, I will be able to make it to coffee machine again.

Hello, old friend