Friday, February 2, 2018

How to Avoid More Train Wrecks

Update February 3: I added a Twitter response made by the first author. In the commentary section a comment by the second author.

I just submitted my review of the manuscript Experimental Design and the Reliability of Priming Effects: Reconsidering the "Train Wreck" by Rivers and Sherman. Here it is.

The authors start with two important observations. First, semantic priming experiments yield robust effects, whereas “social priming” (I’m following the authors’ convention of using quotation marks here) experiments do not. Second, semantic priming experiments use within-subjects designs, whereas “social priming” experiments use between-subjects designs. The authors are right in pointing out that this latter fact has not received sufficient attention.

The authors’ goal is to demonstrate that the second fact is the cause of the first. Here is how they summarize their results in the abstract: “These results indicate that the key difference between priming effects identified as more and less reliable is the type of experimental design used to demonstrate the effect, rather than the content domain in which the effect has been demonstrated.”

This is not what the results are telling us. What the authors have done, is to take existing well-designed experiments (not all of which are priming experiments by the way, as was already pointed out in the social media), and then demolish them to create, I’m sorry to say, more train wrecks of experiments in which only a single trial for each subject is retained. By thus getting rid of the vast majority of trials, the authors end up with an “experiment” that no one in their right mind would design. Unsurprisingly, they find that in each of the cases the effect is no longer significant.

Does this show that “the key difference between priming effects identified as more and less reliable is the type of experimental design used to demonstrate the effect”? Of course not. The authors imply that having a within-subjects design is sufficient for finding robust priming effects, of whatever kind. But they have not even demonstrated that a within-subjects design is necessary for priming effects to occur. For example, based on the data in this manuscript, it cannot be ruled out that a sufficiently powered between-subjects semantic priming effect would, in fact, yield a significant result. We already know from replication studies that between-subjects “social priming” experiments do not yield significant effects, even with large power.

More importantly, the crucial experiment that a within-subjects design is sufficient to yield “social priming” effects is absent from the paper. Without such an experiment, any claims about the design being the key difference between semantic and “social priming” are unsupported.

So where does this leave us? The authors have made an important initial step in identifying differences between semantic and “social priming” studies. However, to draw causal conclusions of the type the authors want to draw in this paper, two experiments are needed.

First, an appropriately powered single-trial between-subjects semantic priming experiment. To support the authors’ view, this experiment should yield a null result. This should of course be tested using the appropriate statistics. Rather than using response times the authors might consider using a word-stem completion task. Contrary to what the the authors would have to predict, I predict a significant effect here. If I’m correct, it would invalidate the authors’ claim about a causal relation between design and effect robustness.

Second, the authors should conduct a within-subjects “social priming” effect (that is close to the ones that they describe in the introduction). Whether or not this is possible, I cannot determine.

If the authors are willing to conduct these experiments--and omit the uninformative ones they report in the current manuscript—then they would make a truly major contribution to the literature. As it stands, they merely add more train wrecks to the literature. I therefore sincerely hope they are willing to undertake the necessary work.

Smaller points

p. 8. “In this approach, each participant is randomized to one level of the experimental design based on the first experimental trial to which they are exposed. The effect of priming is then analyzed using fully between-subjects tests.” But the order in which the stimuli were presented was randomized, right? So this means that this analysis actually compares different items. Given that there typically is variability in response times across items (see Herb Clark’s 1973 paper on the “language-as-fixed-effect fallacy”), this unnecessarily introduces noise into the analysis. Because there usually also is a serial position effect, this problem cannot be solved by taking the same item. One would have to take the same item in the same position. Therefore, it is impossible to take a single trial without losing experimental control over item and order effects. This is another reason why the “experiments” reported in this paper are uninformative.

p. 9. The Stroop task is not really a priming task, as the authors point out in a footnote. Why not use a real priming task?

p. 15. “It is not our intention to suggest that failures to replicate priming effects can be
solely attributed to research design.” Maybe not, but by stating that design is “the key difference,” the authors are claiming it has a causal role.

p. 16. “We anticipate that some critics will not be satisfied that we have examined ‘social
priming’.” I’m with the critics on this one.

p. 17. “We would note that there is nothing inherently “social” about either of these features of priming tasks. For example, it is not clear what is particularly “social” about walking down a hallway.” Agreed. Maybe call it behavioral priming then?

p. 18. “Unfortunately, it is not possible to ask subjects to walk down the same hallway 300 times after exposure to different primes.” Sure, but with a little flair, it should be possible to come up with a dependent measure that would allow for a within-subjects design.

p. 19. “We also hope that this research, for once and for all, eliminates content area as an explanation for the robustness of priming effects.” Without experiments such as the ones proposed in this review, this hope is futile.