Thursday, September 18, 2014

Verbal Overshadowing: What Can we Learn from the First APS Registered Replication Report?

Suppose you witnessed a heinous crime being committed right before your eyes. Suppose further that a few hours later, you’re being interrogated by hard-nosed detectives Olivia Benson and Odafin Tutuola. They ask you to describe the perpetrator. The next day, they call you in to the police station and present you with a lineup. Suppose the suspect is in the lineup. Will you be able to pick him out? A classic study in psychology suggest Benson and Tutuola have made a mistake by first having you describe the perpetrator because the very act of describing the perpetrator will make it more difficult for you to identify him out of the lineup.

This finding is known as the verbal overshadowing effect and was discovered by Jonathan Schooler. In the experiment that is of interest here, he and his co-author, Tonya Engstler-Schooler, found that verbally describing the perpetrator led to a 25% accuracy decrease in identifying him. This is a sizeable difference with practical implications. Based on these findings, we’d be right to tell Benson and Tutuola to lay off interviewing you until after the lineup identification.

Here is how the experiment worked.


Subjects first watched a 44 second video clip of a (staged) bank robbery. Then they performed a filler task for 20 minutes, after which their either wrote down a description of the robber (experimental condition) or listed names of US states and their capitals (control condition). After 5 minutes, they performed the lineup identification task.

How reliable is the verbal-overshadowing effect? That is the question that concerns us here. A 25% drop in accuracy seems considerable. Schooler himself observed subsequent research yielded progressively smaller effects, something he referred to as “the decline effect.” This clever move created a win-win situation for him. If the original finding replicates, the verbal overshadowing hypothesis is supported. If it doesn’t, then the decline effect hypothesis is supported.

The verbal overshadowing effect is the target of the first massive Registered Registration Report under the direction of Dan Simons (Alex Holcombe is leading the charge on the second project) that was just published. Thirty-one labs were involved in direct replications of the verbal overshadowing experiment I just described. Our lab was one of the 31. Due to the large number of participating labs and the laws of the alphabet, my curriculum vitae now boasts an article on which I am 92nd author.

Due to an error in the protocol, the initial replication attempt had the description task and  a filler task in the wrong order before the line-up task, which made the first set of replications, RRR1, a fairly direct replication of Schooler’s Experiment 4 rather than, as was the plan, his Experiment 1. A second set of experiments, RRR2, was performed to replicate Schooler’s Experiment 1. You see the alternative ordering here.

In Experiment 4, Schooler found that subjects in the verbal description condition were 22% less accurate than those in the control condition. A meta-analysis of the RRR1 experiments yielded a considerably smaller, but still significant, 4% deficit. Of note is that all the replication studies found a smaller effect than the original study but that study was also less precise due to having a smaller sample size.

Before I tell you about the results of the replication experiments I have a confession to make. I have always considered the concept of verbal overshadowing plausible, even though I might have a somewhat different explanation for it than Schooler (more about this maybe in a later post), but I thought the experiment we were going to replicate was rather weak. I had no confidence that we would find the effect. And indeed, in our lab, we did not obtain the effect. You might argue that this null effect was caused by the contagious skepticism I must have been oozing. But I did not run the experiment. In fact, I did not even interact about the experiment with the research assistant who ran it (no wonder I’m 92nd author on the paper!). So the experiment was well-insulated from my skepticism.

Let's get back on track. In Experiment 1, Schooler found a 25% deficit. The meta-analysis of RRR2 yielded a 16% deficit-- somewhat smaller but still in the same ballpark. Verbal overshadowing appears to be a robust effect. Also interesting is the finding that the position of the filler task in the sequence mattered. The verbal overshadowing effect is larger when the lineup identification immediately follows the description and when there is more time between the video and the description. In fact either of those or a combination of them could be responsible for this difference in effect sizes.

Here are the main points I take a away from this massive replication effort.

1. Our intuitions about effects may not be as good as we think. My intuitions were wrong because a meta-analysis of all the experiments finds strong support for the effect. Maybe I’m just a particularly ill-calibrated individual or an overly pessimistic worrywart but I doubt it. For one, I was right about our own experiment, which didn’t find the effect. At the same time, I was clearly wrong about the overall effect. This brings me to the second point.

2. One experiment does not an effect make (or break).  This goes both for the original experiment, which did find a big effect, as for our replication attempt (and 30 others). One experiment that shows an effect doesn’t mean much, and neither does one unsuccessful replication. We already knew this, of course, but the RRR drives this point home nicely.

3. RRRs are very useful for estimating effect sizes without having to worry about publication bias. But it should be noted that they are very costly. Using 31 labs seems was probably overkill, although it was nice to see all the enthusiasm for a replication project.

4. More power is better. As the article notes about the smaller effect in RRR1: “In fact, all of the confidence intervals for the individual replications in RRR1 included 0. Had we simply tallied the number of studies providing clear evidence for an effect […], we would have concluded in favor of a robust failure to replicate—a misleading conclusion. Moreover, our understanding of the size of the effect would not have improved."

5. Replicating an effect against your expectations is a joyous experience.  This sounds kind of sappy but it’s an accurate description of my feelings when I was told by Dan Simons about the outcome of the meta-analyses. Maybe I was biased because I liked the notion of verbal overshadowing but it is rewarding to see an effect materialize in a meta-analysis. It's a nice example of “replicating up.”

Where do we go from here? Now that we have a handle on the effect, it would be useful to perform coordinated and preregistered conceptual replications (using different stimuli, different situations, different tasks). I'd be happy to think along with anyone interested in such a project.

Update September 24, 2014. The post is the topic of a discussion on Reddit.

Wednesday, July 9, 2014

Developing Good Replication Practices

In my last post, I described a (mostly) successful replication by Steegen et al. of the ”crowd-within effect.” The authors of that replication effort felt that it would be nice to mention all the good replication research practices that they had implemented in their replication effort.

And indeed, positive psychologist that I am, I would be remiss if I didn’t extol the virtues of the approach in that exemplary replication paper, so here goes.

Make sure you have sufficient power.
We all know this, right?

Preregister your hypotheses, analyses, and code.
I like how the replication authors went all out in preregistering their study. It is certainly important to have the proposed analyses and code worked out up front.

Make a clear distinction between confirmatory and exploratory analyses.
The authors did here exactly as the doctor, A.D. de Groot in this case, ordered. It is very useful to perform exploratory analyses but they should be separated clearly from the confirmatory ones.

Report effect sizes.

Use both estimation and testing, so your data can be evaluated more broadly, by people from different statistical persuasions.

Use both frequentist and Bayesian analyses.
Yes, why risk being pulled over by a Bayes trooper or having a run-in with the Frequentist militia? Again, using multiple analyses allows your results to be evaluated more broadly.

Adopt a co-pilot multi-software approach.
A mistake in data analysis is easily made and so it makes sense to have two or more researchers analyse the data from scratch. A co-author and I used a co-pilot approach as well in a recent paper (without knowing the cool name for this approach, otherwise we would have bragged about it in the article). We discovered that there were tiny discrepancies between our analyses with each of us making a small error here and there. The discrepancies were easily resolved but the errors probably would have gone undetected had we not used the co-pilot approach. Using a multi-software approach seems a good additional way to minimize the likelihood of errors.

Make the raw and processed data available.
When you ask people to share their data, they typically send you the processed data but the raw data are often more useful. The combination is even more useful as it allows other researchers to retrace the steps from raw to processed data. 

Use multiple ways to assess replication success.
This is a good idea in the current climate where the field has not settled on a single method yet. Again, it allows the results to be evaluated more broadly than with a single-method approach.

Maybe these methodological strengths are worth mentioning too?, the first author of the replication study, Sara Steegen, suggested in an email.


Thursday, July 3, 2014

Is There Really a Crowd Within?

In 1907 Francis Galton (two years prior to becoming “Sir”) published a paper in Nature titled “Vox populi” (voice of the people). With the rise of democracy in the (Western) world, he wondered how much trust people could put in public judgments. How wise is the crowd, in other words?

As luck would have it, a weight-judging competition was carried on at the annual show of the West of England Fat Stock and Poultry Exhibition (sounds like a great name for a band) in Plymouth. Visitors had to estimate the weight of a prize-winning ox when slaughtered and “dressed” (meaning that its internal organs would be removed).

Galton collected all 800 estimates. He removed thirteen (and nicely explains why) and then analyzed the remaining 787 ones. He computed the median estimate and found that it was less than 1% from the ox’s actual weight. Galton concludes: This result is, I think, more creditable to the trust-worthiness of a democratic judgment than might have been expected. 

This may seem like a small step to Galton and a big step to the rest of us but later research has confirmed that in making estimates the average of a group of people is more accurate than the predictions of most of the individuals. The effect hinges on when some of the errors in the individual estimates are statistically independent from one another.

In 2008 Edward Vul and Hal Pashler gave an interesting twist to the wisdom of the crowd idea. What would happen, they wondered, if you allow the same individual to make two independent estimates? Would the average of these estimates be more accurate than each of the individual estimates?

Vul and Pashler tested this idea by having 428 subjects guess answers to questions such as What percentage of the world’s airports are in the United States? Vul and Pashler further reasoned that the more the estimates differed from each other, the more accurate their average would be. To test this idea, they manipulated the time between the first and second guess. One group second-guessed themselves immediately whereas the other group made the second guess three weeks later.

Here is what Vul and Pashler found.   

They did indeed observe that the the average of the two guesses was more accurate than each of the guesses separately (the green bars representing the mean squared error are lower than the blue and red ones). Furthermore, the effect of averaging was larger in the 3-week delay condition than in the immediate condition.

Vul and Pashler conclude that forcing a second guess leads to higher accuracy than is obtained by a first guess and that this gain is enhanced by temporally separating the two guesses. So "sleeping on it" works.

How reproducible are these findings? That is what Sara Steegen, Laura Dewitte, Francis Tuerlinckx, and Wolf Vanpaemel set out to investigate in a preregistered replication of the Vul and Pashler study in a special issue of Frontiers in Cognition that I’m editing with my colleague René Zeelenberg. 

Steegen and colleagues tested Flemish psychology students rather than a more diverse sample. They obtained the following results.

Like Vul and Pashler, they obtained a crowd-within effect. The average of the two guesses was more accurate than each of the guesses separately both in the immediate and in the delayed condition. Unlike in Vul and Pashler (2008), the accuracy gain of averaging both guesses compared to guess 1 was not significantly larger in the delayed condition (although it was in the same direction). Instead, the accuracy gain of the average was larger in the delayed condition than in the immediate condition when it was compared to the second guess.

So this replication attempt yields two important pieces of information: (1) the crowd-within effect seems robust, (2) the effect of delay on accuracy gain needs to be investigated more closely. It's not clear yet whether or when "sleeping on it" works.

Edward Vul, the first author of the original crowd-within paper was a reviewer of the replication study. I like how he responded to the results in recommending acceptance of the paper:

The authors carried out the replication as they had planned.  I am delighted to see the robustness of the Crowd Within effect verified (a couple of non-preregistered and thus less-definitive replications had also found the effect within the past couple of years).  Of course, I'm a bit disappointed that the results on replicating the contrast between immediate and delayed benefits are mixed, but that's what the data are.  

The authors have my thanks for doing this service to the community" [quoted with permission]

 Duly noted. 

Thursday, June 5, 2014

Who’s Gonna Lay Down the Law in Psytown?

These are troubled times in our little frontier town called Psytown. The priest keeps telling us that deep down we’re all p-hackers and that we must atone for our sins.

If you go out on the streets, you face arrest by any number of unregulated police forces and vigilantes.

If you venture out with a p-value of .065, you should count yourself lucky if you run into deputy Matt Analysis. He’s a kind man and will let you off with a warning if you promise to run a few more studies, conduct a meta-analysis, and remember never to use the phrase “approaching significance” ever again.

It could be worse.

You could be pulled over by a Bayes Trooper. “Please step out of the vehicle, sir.” You comply. “But I haven’t done anything wrong, officer, my p equals .04.” He lets out a derisive snort “You reckon that’s doin’ nothin’ wrong? Well, let me tell you somethin’, son. Around these parts we don’t care about p. We care about Bayes factors. And yours is way below the legal limit. Your evidence is only anecdotal, so I’m gonna have to book you.”

Or you could run into the Replication Watch. “Can we see your self-replication?” “Sorry, I don’t have one on me but I do have a p<.01.” “That’s nice but without a self-replication we cannot allow you on the streets.” “But I have to go to work.” “Sorry, can’t do, buddy.” “Just sit tight while we try to replicate you.”

Or you could be at a party when suddenly two sinister people in black show up and grab you by the arms. Agents from the Federal Bureau of Pre-registration. “Sir, you need to come with us. We have no information in our system that you’ve pre-registered with us.” “But I have p<.01 and I replicated it” you exclaim while they put you in a black van and drive off.

Is it any wonder that the citizens of Psytown stay in most of the day, fretting about their evil tendency to p-hack, obsessively stepping on the scale worried about excess significance, and standing in front of the mirror checking their p-curves?

And then when they are finally about to fall asleep, there is a loud noise. The village idiot has gotten his hands on the bullhorn again. “SHAMELESS LITTLE BULLIES” he shouts into the night. “SHAMELESS LITTLE BULLIES.”

Something needs to change in Psytown. The people need to know what’s right and what’s wrong. Maybe they need to get together to devise a system of rules. Or maybe a new sheriff needs to ride into town and lay down the law.

Thursday, May 29, 2014

My Take on Replication

There are quite a few comments on my previous post already, both on this blog and elsewhere. That post was my attempt to make sense of the discussion that all of a sudden dominated my Twitter feed (I’d been offline for several days). Emotions were runing high and invective was flying left and right. I wasn’t sure what the cause of this fracas was and tried to make sense of where people were coming from and suggest a way forward. 

Many of the phrases in the post that I used to characterize the extremes of the replication continuum are paraphrases of what I encountered online rather than figments of my own imagination. What always seems to happen when you write about extremes, though, is that people rush in to declare themselves moderates. I appreciate this. I’m a moderate myself. But if we were all moderates, then the debate wouldn’t have spiralled out of control. And it was this derailment of the conversation that I was trying to understand.

But before (or more likely after) someone mistakes one of the extreme positions described in the previous post for my own let me state explicitly how I view replication. It's rather boring.
  • Replication is by no means the solution to all of our problems. I don’t know if anyone seriously believes it is.
  • Replication attempts should not be used or construed as personal attacks. I have said this in my very first post and I'm sticking with it.
  • A failed replication does not mean the original author did something wrong. In fact, a single failed replication doesn’t mean much, period. Just like one original experiment doesn’t mean much. A failed replication is just a data point in a meta-analysis, though typically one with a little more weight than the original study (because of the larger N). The more replication attempts the better.
  • There are various reasons why people are involved in replication projects. Some people distrust certain findings (sometimes outside their own area) and set out to investigate. This is a totally legitimate reason. In the past year or so I have learned that that I’m personally more comfortable with trying to replicate findings from theories that I do find plausible but that perhaps don’t have enough support yet. I call this replicating up. Needless to say, this can still result in a replication failure (but at least I’m rooting for the effect). And then there are replication efforts where people are not necessarily invested in a result, such as the reproducibility project and the registered replication projects. Maybe this is the way of the future. Another option is adversarial replication. 
  • Direct vs. conceptual replication is a false dichotomy. Both are necessary but neither is sufficient. Hal Pashler and colleagues have made it clear why conceptual replication by itself is not sufficient. It’s biased against the Null. If you find an effect you'll conclude the effect has replicated. If you don’t, you’ll probably conclude that you were measuring a different construct after all (I’m sure I must have fallen prey to this fallacy at one point or another). Direct replications have the opposite problem. Even if you replicate a finding many times over, it might be that what you’re replicating is, in fact, an artifact. You’ll only find out if you conduct a conceptual replication, for example with a slightly different stimulus set. I wrote about the reliability and validity of replications earlier, which resulted in an interesting (or so we thought) “diablog” with Dan Simons on this topic (see also here, here, and here).
  • Performing replications is not something people should be doing exclusively (at least, I’d recommend against it). However, it would be good if everyone were involved in doing some of the work. Performing replications is a service to the field. We all live in the same building and it is not as solid as we once thought. Some even say it’s on fire.

Wednesday, May 28, 2014

Trying to Understand both Sides of the Replication Discussion

I missed most of the recent discussion on replication because I’m on vacation. However, the weather’s not very inviting this morning in southern Spain, so I thought I’d try to catch up a bit on the fracas, and try to see where both sides are coming from. My current environment induces me to take a few steps back from it all. Let’s see where this goes. Rather than helping the discussion move forward, I might, in fact, inadvertently succeed in offending everyone involved.

Basically, the discussion is between what I’ll call the Replicators and the Replication Critics Reactionaries. I realize that the Replicators care about more than just replication. The Reactionaries are reactionary in the sense that they Critics are opposing the replication movement. The Replicators and the Reactionaries Critics are the endpoints of what probably is close to a continuum.

Who are the Replicators? As best as I can tell, they are a ragtag group of (1) mid-career-to-senior methodologists, (2) early-to-mid-career social psychologists and social-psychologists-turned-methodologists, (3) mid-career-to-senior psychologists from other areas than social psychology.

Who are the Reactionaries Critics? As best as I can tell they are mid-career-to-senior social psychologists. (If there are Reactionaries Critics who don’t fit this bill, I’d like to hear who they are, so I can expand the category.)

What motivates the Replicators? They are primarily motivated by a concern about the state of our field. However, purely looking at the composition of the group, it is possible that career advancement is at least a small part of the motivation as well. The Replicators are generally not the senior people in their field (social psychology) or are in an area (methodology) where they previously did not have the level of exposure (who reads the Journal of Mathematical Psychology?) that they’re enjoying now. And maybe the people from other areas, who seem to have little extra to gain from taking part in the discussion, just enjoy making snarky comments once in a while.

What motivates the Reactionaries Critics? It is clear that senior social psychologists are often the target of high-profile replication efforts. They are also rattled by recent (alleged and proven) fraud cases among their ranks (Stapel, Sanna, Smeesters, Förster). So it is not surprising that they feel they are under attack and react rather defensively. Given the composition of the group, they have something to lose. They have a reputation. Not only that, they have always been able to publish in high-profile outlets and have received a great deal of positive media attention. All of this is threatened by the replication movement. But there is something else as well, the Reactionaries Critics value creativity in research, maybe above anything else.

How do Replicators view original studies? They view them as public property. The data, the procedure, everything should be available to anyone who wants to scrutinize it. This leads them to be suspicious of anyone who doesn’t want to share.

How do Reactionaries Critics view original studies? They seem to (implicitly) view them a bit like works of art. They are the author’s intellectual property and the process that has led to the results requires a certain artistry that one has to be “initiated in” and cannot easily be verbalized.

How do Replicators view replications? There is no single view. Some replication attempts are clearly efforts to show that particular (high-profile) findings are not reproducible. Other attempts are motivated because someone initially liked a finding and wanted to build on it but was unable to do so. Yet other replication efforts are conducted to examine the reproducibility of the research in an entire area. And there are other motivations as well. The bottom line, however, is that Replicators view replicability as an essential part of science.

How do Reactionaries Critics view replications? Given their emphasis on creativity, they are likely to have a low opinion of replications, which are, by definition, uncreative. Furthermore, because the process that has led to the published results cannot be verbalized easily in their view, replications are by definition flawed because there is always some je-ne-sais-quoi missing.

How do Replicators view Reactionaries Critics? Reactionaries Critics are apparently against open science and therefore probably have something to hide.

How do Reactionaries Critics view Replicators? A good researcher is creative. Replications are, by definition, uncreative, ergo replicators are unimaginative third-rate researchers who are only using replication to try to advance their own careers.

Of course these are caricatures (except in some very prominent cases). My take is that I understand why some Reactionaries feel they are under siege and that it is unfair that the spectre of Stapel is frequently raised when their research is involved. I agree that part of being a good researcher is being creative. However, the most important part of the job is to produce knowledge (which has to be based on reproducible findings). I agree that someone who only does replications, while useful, is not the most impressive researcher. On the other hand, I know that Replicators do their own original and creative research in addition to performing replications (and I see no reasons why Reactionaries couldn't do the same). There are no fulltime replicators outside of Star Trek.

It won’t be a surprise to the readers of this blog that I’m on the side of the Replicators. I think the EXPERIMENT-IS-WORK-OF-ART metaphor is untenable and at odds with what science is all about, which is openness and reproducibility, or EXPERIMENT-IS-PUBLIC-PROPERTY (I’m going all Lakoff on you today). Having said this, my sense is that the notion of replication conflicts with the Reactionaries’ Critics' (implicit) ideas about conducting experiments. To bring the Replicators and Reactionaries Critics closer together it might be useful to have a discussion about what are experiments? and what are experiments for? For now, it would help the discussion if members of both groups abandoned the useless REPLICATION-IS-TREBUCHET metaphor and instead adopt the, admittedly less dramatic, REPLICATION-IS-STRUCTURAL-INTEGRITY-CHECK metaphor (which I tried to promote in my very first post).

“Our house is on fire!” exclaimed E.J. Wagenmakers recently on Facebook. In a similar vein, but with less theatrical flair, I’d put it like this: “our foundation is not as sturdy as we might have thought. Everyone, let’s check!”

Now back to the pool.

Wednesday, May 7, 2014

Are we Intuitively Cooperative (or are we Moving the Goalposts)?

Are we an intuitively cooperative species? A study that was published a few years ago in Nature suggests that indeed our initial inclination is to cooperate with others. We are only selfish if we are allowed to reflect.

How did the researchers obtain these (perhaps counterintuitive) results? Subjects were given an amount of money and had to decide how much of this money, if any, they wanted to contribute to a common project. The subjects were told that they collaborated on this project with three other unknown players whose contributions were not known. They were told that each of the four players received a bonus that was calculated as follows: (additional money – own contribution) + 2*(sum of the contributions)/4.

So you get the highest personal payoff by being selfish and contributing nothing to the common good, regardless of the total contribution of the other three players. A random half of the subjects were required to make a decision on the amount of their contribution within 10 seconds, whereas the other half of the subjects had to think and reflect at least 10 seconds before making their contribution.

The experiments showed an intuitive-cooperation effect. The mean contribution was significantly larger in the intuition condition than in the reflection condition. Hence the conclusion that we are selfish when given the opportunity to deliberate but cooperative when responding intuitively.

Enter my colleagues Peter Verkoeijen and Samantha Bouwmeester. (I wrote about another study by them in a previous post. Basically, the story is this. I have to walk past their office several times a day on my way to the coffee machine and when they have a paper coming out they won’t let me pass unless I promise to write a blog post about them.) They were surprised about these findings and decided to replicate them. They conducted several experiments but found no support for the intuitive collaboration scheme.

What did they do find? First of all, it turned out that only 10% of the subjects understood the payoff scheme. (Did you understand it right away?) This makes an interpretation of the original findings difficult. How can we say anything meaningful when the vast majority of subjects misunderstand the experiment?

Wait a minute! you might say. Perhaps the original study was run with a different subject pool. This is not the case however. One of the two original experiments that found the effect was run on Mechanical Turk. The replication attempts by my colleagues were also run on Mechanical Turk.

Verkoeijen and Bouwmeester were unable to find evidence for intituive cooperation in several experiments even ones that were very close to the original ones. An initial version of their manuscript was reviewed and an anonymous reviewer pointed out that the authors of the original paper, David Rand and his colleagues, in the meantime coincidentally had conducted studies in which they were also unable to replicate their own finding.

Rand and his colleagues had an interesting explanation for this. Mechanical Turk subjects have become familiar with this type of experiment and now will no longer act naively. The entire pool of subjects is now contaminated. There is no hope of finding the intuitive cooperation effect ever again in that crowdsourcing version of Chernobyl. Fortunately, the effect is still there if naïve subjects are used because the effect is moderated by naïveté.

To address the Chernobyl criticism, my colleagues conducted additional experiments. However, they found no evidence for the newfangled naïveté hypothesis. Turkers who classified themselves as not having participated in public-goods experiments before (they were told prior participation would not preclude them from getting paid this time around as well) showed no intuitive cooperation effect.

An anonymous reviewer of the second version of Verkoeijen and Bouwmeester’s manuscript moved the goalpost even a little further. The reviewer (was it the same one as before?) claimed that it is likely that the Turkers lied about having no experience with the experiment. Not only are the Turkers a heavily polluted bunch, they are also inveterate liars.

So in addition to the naïveté hypothesis, we now have the mendacity hypothesis. Such a line of reasoning opens the door to non-falsifiability, of course. Whenever you find the effect, the subjects must have been naïve and when you don’t they must have been lying about having no experience. The editor at PloS ONE  had the good sense not to let this concern block publication of Verkoeijen and Bouwmeester’s article.

The article includes a meta-analysis of the reported experiments. This analysis produced no evidence for the intuitive cooperation hypothesis. In fact, the aggregate effect is going in the opposite direction. In addition, there are several other unsuccessful replications of the intuitive cooperation effect performed by a Swedish group.

It looks like the discussion on intuitive cooperation has reached an impasse with some initial experiments by one group showing an effect while subsequent experiments from several groups have produced nonreplications. Where do we go from here?

Peter Verkoeijen and Samantha Bouwmeester have initiated a Registered Replication Report with Perspectives of Psychological Science. A number of labs will independently test the intuitive cooperation hypothesis according to a strict protocol to be developed in collaboration with the original authors. I cannot think of a better way to resolve the discussion and stop the goalposts from moving. And what's more important, I will be able to make it to coffee machine again.

Hello, old friend