Wednesday, February 6, 2013

Bruce Springsteen and Lazy Susan: The Logic of Experimentation


Many papers have more than one experiment. How do researchers string together experiments?

When I was a beginning assistant professor, I would tell my students The difficult part is not Experiment 1, it is Experiment 2 (when saying this, I would always picture myself as a grizzled war veteran, smoking a pipe and using its stem to point out locations in the field of battle to my crop of young recruits.)

Why was Experiment 2 so difficult? Well, it couldn’t stray too far from Experiment 1 or reviewers might expose a gap in the causal chain that leads from experiment to experiment. It also couldn’t be too close because then you’d have a direct replication of Experiment 1. (Nowadays we would see this as a virtue but back then it would probably have been viewed as unnecessary padding.)

For many years I thought having a tight chain of experiments was normal for experimental research. But then I started reading articles in social psychology. I did this partly because I was asked as a reviewer (due to my interest in embodied cognition there was some overlap with some work in social cognition, at least in the perception of some editors) and partly because I became chairman of a committee investigating alleged fraud in the work of Dirk Smeesters.

It struck me that quite a few social psychology articles (though most certainly not all) don’t follow the logic of experimentation I just described. The experiments in those papers all relate to a general theme but there is no causal chain. I’ve tried to depict the difference between cognitive and those social experiments below.

Eagles
Youth soccer












In cognitive psychology there tend to be strong causal connections between the individual experiments (the blue boxes). Usually, the first experiment is the main one, relating to the research question (the red box), and the others serve to rule out potential alternative explanations for the results of the main experiment. It’s the Bruce Springsteen model. One experiment is the boss and the others are important but act in a supportive role—the E Street Band. However, sometimes all experiments are equally important to the main research question and then we have the Eagles model. The Bruce Springsteen and Eagles models describe pretty much all of cognitive research.

In the social psychology articles I’m talking about, there does not always appear to be a causal connection between the experiments. They all relate somewhat (hence the thinner arrows in the social model) to the central research question (which is pitched at a higher level than in cognitive articles) but in very different ways. You could pretty much put the experiments in any order and the paper would be the same. I cannot think of a musical analogy but I'm reminded of youth soccer. The experiments are like a bunch of six-year-olds playing soccer. They’re all very cute but they’re not playing as a team.

I don't know how much of social psychology can be explained by the youth-soccer model but I think it's a sizable subset (I also think virtually no cognitive studies conform to this model).

Here is a concrete example. I’m not picking on the authors. I could easily have selected many others and I think the experiments in this paper are very clever. I’m also not going to discuss the research in detail. I’m only interested in how the experiments hang together.

The authors examined whether merely executing and seeing clockwise (vs. counterclockwise) movements would induce psychological states of temporal progression (vs. regression) and according to motivational orientations toward the future and novelty (vs. the past and familiarity).

Here is how they summarize the support for their hypothesis:

Participants who turned cranks counterclockwise preferred familiar over novel stimuli, but participants who turned cranks clockwise preferred novel over old stimuli… (Experiment 1). Also, participants rotating a cylinder clockwise reported higher scores in the personality measure openness to experience than participants rotating counterclockwise (Experiment 2). Merely passively watching a rotating square had similar but weaker effects on exposure and openness (Experiment 3). Finally, participants chose more unconventional candies from a clockwise than from a counterclockwise Lazy Susan, that is, a turntable (Experiment 4).

So Experiment 2 used a different task than Experiment 1 (what happened to the cranks?) and had a different dependent measure. Why not change one thing at a time? Do we really already know what’s going on with the crank task? And why use a self-report task when you had a nice implicit task in Experiment 1? Is indicating agreement with statements like Poetry does not deeply impress me really comparable to judging Chinese ideographs (which is what the subjects did in Experiment 1)?

In Experiment 3, the subjects watched a square whereas they rotated a cylinder (which turns out to be a paper towel roll; I’ll take the cranks anytime) in Experiment 2. The researchers have good reasons for using a square (it doesn’t look like a clock) and the experiment makes sense in light of the previous one. They wanted to disentangle visual and manual rotation. Also good about Experiment 3 is that it uses the dependent measures of Experiments 1 and 2. Still, how do we know that those tasks are measuring the same mechanism? One is implicit and the other requires introspection. One is visual and the other verbal.

But then we get to Experiment 4. Here Lazy Susan makes an entrance and the dependent measure is candy selection (leave it to Lazy Susan to get the party going). Now I am seriously confused, and it’s not because of Lazy Susan’s pretty blue eyes.

The advantage of the Springsteen/Eagles model is that it results in a tightly interwoven set of experiments. Each experiment is often a partial replication of the previous one. But there is the risk of vacuous experimentation; researchers lose sight of the bigger question that motivated the research in the first place. But that is just an experiment about an experiment I have heard myself say during many lab meetings (waving my imaginary pipe). I could mention whole areas of cognitive psychology that are mired in vacuous experimentation, but that’s for a later post.

The advantage of the youth-soccer model is that researchers keep their eye on the bigger question. The disadvantage is that it does not allow them to gather a coherent body of evidence to address this question. And there is another disadvantage: false positives. Suppose the Lazy Susan experiment had “failed.” The researchers could have simply concluded that the task didn’t “work” (damn you, Lazy Susan!) and tried something else. Maybe they would have had people watch a hamster in a treadmill and then judge melodies played by a harpsichord or by a synthesizer.

I don’t know if there are courses on the logic of experimentation. If one were to be developed, it should teach the Bruce Springsteen model and the Eagles model. But it should also tell students not to lose sight of the red box.


14 comments:

  1. I would like to add another model - Madonna (or to be more contemporary: the lady gaga-model:
    In some research areas it is simply too expensive to do a bunch of experiments - so there is only one. Think for instance about certain survey- experiments that are very expensive to carry out because they involve a nationally representative survey. (Think of political science, communication science, sociology as fields in which occasionally use survey experiments) Here of course, the experiment is directed at the red box, and nothing else (I hope so, at least). Please keep that in mind as you develop your course on the logic of experimentation!

    ReplyDelete
  2. Excellent point! The post is on papers with multiple experiments but you're right that there are many papers that only have one experiment. How could I have overlooked Lady Gaga?

    ReplyDelete
  3. This is interesting - it reminds me of one article on which I was co-author. It had a cognitive model and was sent to a journal. However, the first three studies (of four!) were considered redundant by the editors and we reduced our paper to one (very good!) experiment. Unthinkable now. Bit like a bear dancing to a band of six year old soccer players :)

    ReplyDelete
  4. Wow, that's a pretty sharp reduction. Indeed, unthinkable now.

    Interesting image of the dancing bear. I used to coach six-year-olds. I'm glad there never was a bear anywhere to be seen.

    ReplyDelete
  5. :) It's more like a band member going solo. Like Peter Gabriel leaving Genesis.

    ReplyDelete
  6. Consistent with your concerns about the clockwise/novelty experiments, the set appears to be biased. I discuss this case in a paper that is now in press in the Journal of Mathematical Psychology. The analysis is a small part of the paper (starting around page 24). A preprint is at http://www1.psych.purdue.edu/~gfrancis/Publications/Francis2013b.pdf
    When published, the journal will also post commentaries from statisticians and other interested researchers.

    ReplyDelete
    Replies
    1. It seems almost inevitable for the "youth soccer model of experimentation" to lead to false positives. It is only natural for researchers to conclude that the "failed" experiment was actually not getting at the phenomenon they were interested in. As you say in the paper (p. 27), this bias can creep in without the researchers being aware of it.

      Delete
    2. It is difficult to avoid bias with the "youth soccer model", but not impossible. If researchers pay attention to the uncertainty in their measurements, then they will not be tempted to draw a strong conclusion from their results. They also might be willing to consider the findings from the "failed" experiments as methodologically sound.

      Delete
    3. It would certainly be best to include all experiments (except for the ones with procedural flaws such as counterbalancing errors and the like). But I'd still be worried that even the "successful" experiments under this model aren't really measuring the same thing. They're just too different. But this obviously depends on your level of abstraction.

      Delete
    4. Yes, being unbiased does not mean the experiments are valid. A lot of things have to be done correctly to draw a proper scientific inference. As you say, whether experiments work together to support a theoretical conclusion depends (in part) on the theory. If the theory is broad enough to incorporate all of the experimental findings in the report, then it should also hold for lots of other situations as well. Overall, though, I agree that is difficult to draw a firm theoretical conclusion from rather different experiments that all have low power.

      Delete
  7. First, it's worth noting that I think it's a mis-characterization to call one model the cognitive one and the other the social one. Many articles in social psychology absolutely follow what you are calling the cognitive model. That's not to say that there aren't any examples of articles that leap from study to study in the same general area, but these are not the norm -- certainly not enough so to be called the social model. That doesn't mean that they don't exist, however, and that their shortcomings (and potential strengths) shouldn't be addressed.

    Second, I think you hit on an important point when you say that too many things change in the research you note, moving from study to study. There is a temptation to show that -- as Greg alludes to -- your theory is so good, that it allows you to make consistent predictions across different contexts with different operationalizations and dependent variables. I think the problem becomes especially apparent, as you explain, in the context of false positives and how easily we can convince ourselves that our good studies reflect the truth while our bad studies were poorly executed.

    I am clearly biased on this front, as I'm married to one of the study's authors, but this work on visceral fit (http://psycnet.apa.org/psycinfo/2011-01020-001/) is a great example of integrating both approaches to research. They start with a basic finding in the field -- people believe more in global warming on hot days. Next, they replicate it in the lab in order to rule out obvious confounds, by heating participants' cubicles. Next, they rule out other possible explanations in studies with minor modifications, some of which replicate the original (lab) effect in some conditions. They then identify the mechanism by which the effect is taking place. Finally, they conceptually replicate the effect of heat on global warming by showing that thirst has the same effect of desertification.

    Granted, not all social psych articles look like this. In fact, I believe this one benefited a great deal from the review process (which is, I think, a good thing). But you've got parts of both models incorporated, including direct replication, conceptual replication, an account for mechanism, and a ruling out of alternatives. It's very clear what the "red box" is throughout.

    I certainly agree though, that in the kids playing soccer model, people need to be very clear about the difference between exploratory and confirmatory research -- sometimes it seems that failed studies are exploratory, and successful ones magically become confirmatory. And it's an understandable bias, but a very problematic one. Students need to be taught that if they discover something exploratory, they need to replicate it, and journal reviewers need to not only accept the inclusion of such studies in an article but demand them.

    ReplyDelete
    Replies
    1. Thanks very much for this thoughtful response. You're right that the youth-soccer model shouldn't be equated with social psychology, as many papers in that field do indeed use the "cognitive" models. I made an effort to point this out in the post but it cannot hurt to repeat it.

      I am familiar with the work on visceral fit. I like the paper you metnion very much. In fact, one of the methods used there was the inspiration for a method we used in a paper (0n a very different topic) that came out last year http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0036154.

      I agree that reviewers and editors should demand replications when these are needed and feasible, see this post: http://rolfzwaan.blogspot.nl/2013/02/you-say-you-want-revolution.html.

      Delete
    2. Thanks Rolf, I saw Bobbie's blog post a few days ago and found it very encouraging. I just typed out a lengthier response a few minutes ago but Blogger seems to have eaten it.

      In short, my point was that we should not wait for the "revolution" to happen in a top-down way. Individuals need to take action and take responsibility from the bottom up. However, doing so can be very costly for individuals, particularly early in their careers, and they need all the support they can get, particularly from reasonable and thoughtful reviewers. Eventually, with a bottom-up approach, norms will change, and various forms of p-hacking will appear wildly unacceptable, rather than widely used practices that are overlooked, dismissed, or ignored. Thanks for setting a good example.

      Delete
  8. I updated the post slightly after Dave Nussbaum's comments, trying to make sure the youth-soccer model was not equated with social psychology. I changed "social psychology model" to the more apppropriate "youth-soccer model."

    ReplyDelete