Update February 3: I added a Twitter response made by the first author. In the commentary section a comment by the second author.
I just submitted my review of the manuscript Experimental Design and the Reliability of Priming Effects: Reconsidering the "Train Wreck" by Rivers and Sherman. Here it is.
The authors start with two important observations. First, semantic priming experiments yield robust effects, whereas “social priming” (I’m following the authors’ convention of using quotation marks here) experiments do not. Second, semantic priming experiments use within-subjects designs, whereas “social priming” experiments use between-subjects designs. The authors are right in pointing out that this latter fact has not received sufficient attention.
The authors’ goal is to demonstrate that the second fact is the cause of the first. Here is how they summarize their results in the abstract: “These results indicate that the key difference between priming effects identified as more and less reliable is the type of experimental design used to demonstrate the effect, rather than the content domain in which the effect has been demonstrated.”
This is not what the results are telling us. What the authors have done, is to take existing well-designed experiments (not all of which are priming experiments by the way, as was already pointed out in the social media), and then demolish them to create, I’m sorry to say, more train wrecks of experiments in which only a single trial for each subject is retained. By thus getting rid of the vast majority of trials, the authors end up with an “experiment” that no one in their right mind would design. Unsurprisingly, they find that in each of the cases the effect is no longer significant.
Does this show that “the key difference between priming effects identified as more and less reliable is the type of experimental design used to demonstrate the effect”? Of course not. The authors imply that having a within-subjects design is sufficient for finding robust priming effects, of whatever kind. But they have not even demonstrated that a within-subjects design is necessary for priming effects to occur. For example, based on the data in this manuscript, it cannot be ruled out that a sufficiently powered between-subjects semantic priming effect would, in fact, yield a significant result. We already know from replication studies that between-subjects “social priming” experiments do not yield significant effects, even with large power.
More importantly, the crucial experiment that a within-subjects design is sufficient to yield “social priming” effects is absent from the paper. Without such an experiment, any claims about the design being the key difference between semantic and “social priming” are unsupported.
So where does this leave us? The authors have made an important initial step in identifying differences between semantic and “social priming” studies. However, to draw causal conclusions of the type the authors want to draw in this paper, two experiments are needed.
First, an appropriately powered single-trial between-subjects semantic priming experiment. To support the authors’ view, this experiment should yield a null result. This should of course be tested using the appropriate statistics. Rather than using response times the authors might consider using a word-stem completion task. Contrary to what the the authors would have to predict, I predict a significant effect here. If I’m correct, it would invalidate the authors’ claim about a causal relation between design and effect robustness.
Second, the authors should conduct a within-subjects “social priming” effect (that is close to the ones that they describe in the introduction). Whether or not this is possible, I cannot determine.
If the authors are willing to conduct these experiments--and omit the uninformative ones they report in the current manuscript—then they would make a truly major contribution to the literature. As it stands, they merely add more train wrecks to the literature. I therefore sincerely hope they are willing to undertake the necessary work.
Smaller points
p. 8. “In this approach, each participant is randomized to one level of the experimental design based on the first experimental trial to which they are exposed. The effect of priming is then analyzed using fully between-subjects tests.” But the order in which the stimuli were presented was randomized, right? So this means that this analysis actually compares different items. Given that there typically is variability in response times across items (see Herb Clark’s 1973 paper on the “language-as-fixed-effect fallacy”), this unnecessarily introduces noise into the analysis. Because there usually also is a serial position effect, this problem cannot be solved by taking the same item. One would have to take the same item in the same position. Therefore, it is impossible to take a single trial without losing experimental control over item and order effects. This is another reason why the “experiments” reported in this paper are uninformative.
p. 9. The Stroop task is not really a priming task, as the authors point out in a footnote. Why not use a real priming task?
p. 15. “It is not our intention to suggest that failures to replicate priming effects can be
solely attributed to research design.” Maybe not, but by stating that design is “the key difference,” the authors are claiming it has a causal role.
p. 16. “We anticipate that some critics will not be satisfied that we have examined ‘social
priming’.” I’m with the critics on this one.
p. 17. “We would note that there is nothing inherently “social” about either of these features of priming tasks. For example, it is not clear what is particularly “social” about walking down a hallway.” Agreed. Maybe call it behavioral priming then?
p. 18. “Unfortunately, it is not possible to ask subjects to walk down the same hallway 300 times after exposure to different primes.” Sure, but with a little flair, it should be possible to come up with a dependent measure that would allow for a within-subjects design.
p. 19. “We also hope that this research, for once and for all, eliminates content area as an explanation for the robustness of priming effects.” Without experiments such as the ones proposed in this review, this hope is futile.
I just submitted my review of the manuscript Experimental Design and the Reliability of Priming Effects: Reconsidering the "Train Wreck" by Rivers and Sherman. Here it is.
The authors’ goal is to demonstrate that the second fact is the cause of the first. Here is how they summarize their results in the abstract: “These results indicate that the key difference between priming effects identified as more and less reliable is the type of experimental design used to demonstrate the effect, rather than the content domain in which the effect has been demonstrated.”
This is not what the results are telling us. What the authors have done, is to take existing well-designed experiments (not all of which are priming experiments by the way, as was already pointed out in the social media), and then demolish them to create, I’m sorry to say, more train wrecks of experiments in which only a single trial for each subject is retained. By thus getting rid of the vast majority of trials, the authors end up with an “experiment” that no one in their right mind would design. Unsurprisingly, they find that in each of the cases the effect is no longer significant.
Does this show that “the key difference between priming effects identified as more and less reliable is the type of experimental design used to demonstrate the effect”? Of course not. The authors imply that having a within-subjects design is sufficient for finding robust priming effects, of whatever kind. But they have not even demonstrated that a within-subjects design is necessary for priming effects to occur. For example, based on the data in this manuscript, it cannot be ruled out that a sufficiently powered between-subjects semantic priming effect would, in fact, yield a significant result. We already know from replication studies that between-subjects “social priming” experiments do not yield significant effects, even with large power.
More importantly, the crucial experiment that a within-subjects design is sufficient to yield “social priming” effects is absent from the paper. Without such an experiment, any claims about the design being the key difference between semantic and “social priming” are unsupported.
So where does this leave us? The authors have made an important initial step in identifying differences between semantic and “social priming” studies. However, to draw causal conclusions of the type the authors want to draw in this paper, two experiments are needed.
First, an appropriately powered single-trial between-subjects semantic priming experiment. To support the authors’ view, this experiment should yield a null result. This should of course be tested using the appropriate statistics. Rather than using response times the authors might consider using a word-stem completion task. Contrary to what the the authors would have to predict, I predict a significant effect here. If I’m correct, it would invalidate the authors’ claim about a causal relation between design and effect robustness.
Second, the authors should conduct a within-subjects “social priming” effect (that is close to the ones that they describe in the introduction). Whether or not this is possible, I cannot determine.
If the authors are willing to conduct these experiments--and omit the uninformative ones they report in the current manuscript—then they would make a truly major contribution to the literature. As it stands, they merely add more train wrecks to the literature. I therefore sincerely hope they are willing to undertake the necessary work.
Smaller points
p. 8. “In this approach, each participant is randomized to one level of the experimental design based on the first experimental trial to which they are exposed. The effect of priming is then analyzed using fully between-subjects tests.” But the order in which the stimuli were presented was randomized, right? So this means that this analysis actually compares different items. Given that there typically is variability in response times across items (see Herb Clark’s 1973 paper on the “language-as-fixed-effect fallacy”), this unnecessarily introduces noise into the analysis. Because there usually also is a serial position effect, this problem cannot be solved by taking the same item. One would have to take the same item in the same position. Therefore, it is impossible to take a single trial without losing experimental control over item and order effects. This is another reason why the “experiments” reported in this paper are uninformative.
p. 9. The Stroop task is not really a priming task, as the authors point out in a footnote. Why not use a real priming task?
p. 15. “It is not our intention to suggest that failures to replicate priming effects can be
solely attributed to research design.” Maybe not, but by stating that design is “the key difference,” the authors are claiming it has a causal role.
p. 16. “We anticipate that some critics will not be satisfied that we have examined ‘social
priming’.” I’m with the critics on this one.
p. 17. “We would note that there is nothing inherently “social” about either of these features of priming tasks. For example, it is not clear what is particularly “social” about walking down a hallway.” Agreed. Maybe call it behavioral priming then?
p. 18. “Unfortunately, it is not possible to ask subjects to walk down the same hallway 300 times after exposure to different primes.” Sure, but with a little flair, it should be possible to come up with a dependent measure that would allow for a within-subjects design.
p. 19. “We also hope that this research, for once and for all, eliminates content area as an explanation for the robustness of priming effects.” Without experiments such as the ones proposed in this review, this hope is futile.
It would seem that we failed to make our point clearly. You appear to maintain the view that “social priming” studies are defined by the use of a particular type of broad behavioral dependent variable, as in Bargh’s original aging study. We strenuously disagree with this definition. We argue that design features and content (social versus non-social) are independent features of priming studies. We argue and show that there are many robust priming effects using social stimuli and within-subject designs. If you insist on excluding these effects (such as the Weapons Identification Task and Stereotype Misperception Task) that use social stimuli, were created by social psychologists, and were published in social psychology journals as “social priming,” then there is a circular argument that can never be addressed with data: “Social priming” by this definition will rarely produce a robust effect.
BeantwoordenVerwijderenYou ask us to provide a large within-subjects version of a study such as Bargh’s. Aside from being impossible, this request, again, is based on the insistence that tasks such as the Weapons Identification Task and the Stereotype Misperception Task don’t “count” as social priming. We reject this definition.
We do not observe that “social priming” effects do not yield robust effects, as you state. We observe that between-subjects priming studies do not yield robust effects. We also do not observe that semantic priming experiments use within-subjects designs, whereas “social priming” experiments use between-subjects design. To the contrary, we argue and show that priming studies using social stimuli are robust when within-subjects designs are used, and that there are many existing examples of such effects in the literature. Our goal is to show that within-subject priming studies are more powerful and likely to produce robust effects than between-subject priming studies, regardless of the content.
This sentence is causing trouble: “These results indicate that the key difference between priming effects identified as more and less reliable is the type of experimental design used to demonstrate the effect, rather than the content domain in which the effect has been demonstrated.” A more precise wording would be: “Among these studies, the content domain did not differentiate robust from non-robust findings. Instead, design type (within vs. between) was the key feature that differed among robust and non-robust findings.”
You suggest that we are ruining perfectly good designs to create additional train wrecks, as if we thought the between-subject versions of our studies were sensible--as if we didn’t know that those designs were problematic. You ask us to remove these “uninformative” studies. But, the whole point was to show that they are problematic! We know that the within-subject versions are more robust. That’s the point.
We never claim that within-subject designs are necessary or sufficient for finding robust effects. We demonstrate that within-subject versions of a priming measure are more robust than between-subject versions of the same measure. Nor do we argue that between-subject designs are incapable of producing reliable effects.