Tuesday, May 16, 2017

Sometimes You Can Step into the Same River Twice

              A recurring theme in the replication debate is the argument that certain findings don’t replicate or cannot be expected to replicate because the context in which the replication is carried out differs from the one in which the original study was performed. This argument is usually made after a failed replication.

In most such cases, the original study did not provide a set of conditions under which the effect was predicted to hold, although the original paper often did make grandiose claims about the effect’s relevance to variety of contexts including industry, politics, education, and beyond. If you fail to replicate this effect, it's a bit like you've just bought a car that was touted by the salesman as an "all-terrain vehicle," only to have the wheels come off as soon as you drive it off the lot.*

            As this automotive analogy suggests, the field has two problems: many effects (1) do not replicate and (2) are grandiosely oversold. Dan Simons, Yuichi Shoda, and Steve Lindsay have recently made a proposal that provides a practical solution to the overselling problem: researchers need to include in their paper a statement that explicitly identifies and justifies the target populations for the reported findings, a constraints on generality (COG) statement. Researchers also need to state whether they think the results are specific to the stimuli that were used and to the time and location of the experiment. Requiring authors to be specific about the constraints on generality is a good idea. You probably wouldn't have bought the car if the salesman had told you its performance did not extend beyond the lot. 

          A converging idea is to systematically examine which contextual changes might impact which (types of) findings. Here is one example. We always assume that subjects are completely naïve with regard to an experiment, but how can we be sure? On the surface, this is primarily a problem that vexes on-line research using databases such as Mechanical Turk, which has forums on which subjects discuss experiments. But even with the good old lab experiment we cannot always sure that our subjects are naïve to the experiment, especially when we try to replicate a famous experiment. If subjects are not blank slates with regard to an experiment, a variation of population has occurred relative to the original experiment. We've gone from sampling from a population of completely naïve subjects to sampling from one with an unknown percentage of repeat-subjects.

            Jesse Chandler and colleagues recently examined whether prior participation in experiments affect effect sizes. They tested subjects in a number of behavioral economics tasks (such as sunk cost and anchoring and adjustment) and then retested these same individuals a few days later. Chandler et al. found an estimated 25% reduction in effect size, suggesting that the subjects’ prior experience with the experiment did indeed affect their performance in the second wave. A typical characteristic of these experiments is that they require reasoning, which is a controlled process. How about tasks that tap more into automatic processing?

             To examine this question, my colleagues and I examined nine well-known effects in cognitive psychology, three from the domain of perception/action, three from memory, and three from language. We tested our subjects in two waves, the second wave three days later than the first one. In addition, we used either the exact same stimulus set or a different set (with the same characteristics, of course).

            As we expected, all effects replicated easily in an online environment. More importantly, in contrast to Chandler and colleagues' findings, repeated participation did not lead to a reduction in effect size in our experiments. Also, it did not make a difference if the exact same stimuli were used or a different set.

            Maybe you think that this is not a surprising set of findings. All I can say that before running the experiments, our preregistered prediction was that we would obtain a small reduction of effect sizes (smaller than the 25% of Chandler et al.). So we at least were a little surprised to find no reduction.

            A couple of questions are worth considering. First, do the results indicate that the initial participation left no impression whatsoever on the subjects? No, we cannot say this. In some of the response-time experiments, for example, we obtained faster responses in wave 2 than in wave 1. However, because the responses also became less varied in their performance, the effect size did not change appreciably. A simple way to put it would be to say that the subjects became better at performing the task (as they perceived it) but remained equally sensitive to the manipulation. In other cases, such as the simple perception/action tasks, responses did not speed up, presumably because subjects were already performing at asymptote level.

            Second, how non-naïve were our subjects in wave 1? We have no guarantee that the subjects in wave 1 were completely naïve with regard to our experiments. What our data do show, though, is that the 9 effects replicate in an online environment (wave 1) and that repeating the experiment a mere few days later (wave 2) by the same research group does not reduce the effect size.

           So, in this sense, you can step into the same river twice. 

* Automotive metaphors are popular in the replication debate, see also this opinion piece in Collabra: Psychology by Simine Vazire.


No comments:

Post a Comment