Thursday, February 28, 2013

You Say You Want a Revolution


The popular press as well as the psychological literature itself is abuzz with reports about the field trying to clean up its act, especially by demanding replications (though not everyone agrees). Many reports do not fail to mention that this sudden sanitation urge was prompted by the Stapel fraude case. This is ironic, because it is logically impossible to replicate Stapel. His work was never “plicated” in the first place!

Nevertheless, it is clear that the field is in a state of revolution. There are large-scale replication efforts under way, most notably the Open Science Framework. But replication starts at home. So here are some things we can do as authors, reviewers, and editors.

Authors

Incorporate direct replications into your workflow (I hesitate to use this buzzword but it seemed quite popular among those interested in replication during a recent Google hangout). 

I have reported direct replications in one published paper and I have recently submitted two other papers in which each of the experiments has a direct replication. I won’t lie. It is quite cumbersome to have to do a direct replication for every experiment you’ve run and so far it has been really easy for me because I have been using Mechanical Turk (and even so, I have repeatedly cursed myself for wanting to do replications).

An attractive alternative to direct replication—already quite common in cognitive psychology (but maybe not so much other areas of psychology)—is to run an experiment that is close to the original but changes one aspect. In an earlier post, I called this the Bruce Springsteen or Eagles model.

A direct replication is especially called for if the original finding meets one or more of the following criteria: it (1) is highly novel and surprising, (2) has strong practical or theoretical import, (3) runs contrary to established findings or theories, (4) is underpowered. If readers can think of additional criteria, I’d like to hear them.

Reviewers

State explicitly in your review if a direct replication is required. If the study meets one of the criteria listed above, it is a candidate for direct replication.  However, do consider the feasibility of a direct replication. Obviously, a direct replication is going to be difficult with special populations, longitudinal studies, and time-sensitive experiments (e.g., voters’ reactions to Obama’s re-election). So be reasonable. Don’t ask for an impossible replication attempt just because you don’t like the study or its author.

Editors

If the reviewers identify aspects of the research that overlap with the criteria listed above and call for additional data without mentioning a direct replication, call for a direct replication. If reviewers ask for a direct replication but this is unreasonable, say so.

I recently asked for a direct replication of a study that met some of the criteria I listed above (the reviewers had not asked for a direct replication but wanted to see additional data). One of the co-authors happened to be Bobbie Spellman (I only realized this after I’d composed my action letter), herself a staunch advocate of replication. She calls my action letter her first “post-revolution action letter. ” It is post-revolution because it asks for a direct replication [postscript: and for reasons Bobbie mentions in her comment to this post].

It is actually the second post-revolution action letter I’ve written. I probably should have written more in the past and I plan to write more in the future. I hope other editors will start writing post-revolution action letters as well.


These are obviously the first baby steps toward a full-scale integration of replication into our scientific practice. But as Oasis (themselves a failed replication of the Beatles) stated: I start a revolution from my bed

Tuesday, February 26, 2013

Maurice is Back! Time in Narrative Comprehension


In our everyday experience time is continuous and chronological. In stories we can jump around in time. Just as with many other things, Aristotle had already given this discrepancy thought. 

In his Poetics he declared that historians are to provide a blow-by-blow chronological account of events. Authors of fiction, on the other hand, are not bound by this directive. For them, plot is the constraining factor. If the plot calls for a jump ahead in time, the author should do so. No need for a detailed account of Odysseus’ daily bowel movements during the seven years that he was ensnared by Calypso.

Time shifts are extremely common in stories. For example, we often encounter phrases like an hour later, which force us to jump an hour forward in time from one sentence to the next. But how do we process such time shifts?

Cognitive psychologists have begun to address this question. An early example is here. In a study published in 1996, I compared time shifts like an hour later to non-time shifts such as a moment later. 

There were three main findings.
·     Reading times for hour shifts are longer than for moment shifts.
·      After an hour shift, information before shift is less accessible to the reader than after a moment 
    shift.
·      Events separated by a moment are more strongly connected in long-term memory than events
    separated by an hour.


So it looks like time shifts act as separators between events. We deactivate information that comes before the time shift and (possibly as a result) events separated by the time shift become separated in long-term memory; furthermore, processing time shifts is resource consuming (takes time).

In a later paper, we found that a similar deactivation occurs when it is explicitly stated that an action is discontinued. For instance, people respond more slowly to the word playing after He stopped playing the piano than after He was playing the piano despite the fact that they just read the word playing.

Apparently we don’t represent information in a form that resembles the text but rather in a form that is close to the described situation. The time shift or explicit discontinuation forms as some sort of mental barrier to what has happened before.

In my 1996 paper, I examined these questions using a couple dozen short stories. The story I use as an example in the paper and in talks involves a gallery owner named Maurice. I pictured him as an overly sensitive dandy. I thought the name Maurice was a good fit (with apologies to everyone named Maurice). I kind of suspect that Nathan Lane’s role of Pepper Salts in the hit show Modern Family is based on the Maurice character.

Anyhow, after a 17-year hiatus, Maurice makes a comeback in a recent paper by Weingartner and Myers. The authors adapted the Maurice story (and other stories from the 1996 paper). Again, Maurice has a grand opening of his art gallery but this time things go better for him than in the original story (where he’d forgotten to invite the local art critic, to his own detriment). That’s reassuring to know. But what about the findings?

Like the 1996 study and other studies, Weingartner and Myers find that time shifts lead to increases in processing times. They measured eye fixations whereas the earlier studies used key presses, so this is an extension of those earlier findings.

The authors also find that reading times for time shifts are longer after a discontinuation (Maurice stopped doing something) than after a continuation. Because there is no interaction between time shift and discontinuation, their effects appear to be additive. So it takes extra long to read what happened an hour later after someone discontinued doing something.

But Weingartner and Myers also find that time shifts might be what I’ll call “semi-permeable,” letting through some information but not other. For example, if the story made reference to something before the time shift, the reading times for this anaphor did not vary as a function of time shift. Apparently, the information was equally available after a moment or hour of story time.

This is different from the earlier studies, which found that information was less available after an hour of story time than after a moment. But what was also different was the method. The earlier studies used a probe-recognition task. A word was presented and the subjects had to indicate whether or not it had occurred in the text. Weingartner and Myers only used reading times.

In a second experiment, therefore, they used a probe recognition task. Responses were significantly slower after a discontinuation than after a continuation, replicating our 2000 findings but there was no effect of time shift, not replicating the 1996 findings (as well as other findings).

The authors point out that there is a potentially critical difference with the 1996 study. In that study, the probe words were actions, which have a fleeting nature. In the current study, the probes were nouns, which referred to stable entities in the situation. A 2010 study by Radvansky and Copeland had found earlier that time shifts led to deactivation of activities but not entities.

So maybe time shifts are semi-permeable, letting through stable situational elements but not more ephemeral ones. Explicit discontinuations apparently are not permeable. This helps constrain models of discourse comprehension.

As the authors note in conclusion: Much remains to be done. I agree. I have therefore decided to take this topic back up again. The Maurice saga continues… 

Friday, February 15, 2013

Catching Fly Balls is not Reading Dostoyevsky


In 2001 and 2002, my students and I published two papers on mental simulation in Psychological Science. I warned my students that these experiments might draw a lot of attention and possibly criticism. I was about to make full professor, so I was not worried about myself but I was worried about my students. I was particularly concerned about a backlash from traditional psycholinguists.

I was wrong. There was no backlash. Instead, other people started using our paradigm and made nice careers for themselves doing so. It stayed like this until 2008. Then a paper appeared co-authored by Mahon and Caramazza.

They called our experiments “elegant and ingenious” (thanks for that) but argued that our results did not rule out an account in which the actual work was done by abstract symbols rather than perceptual representations, with activation cascading down from abstract representations to the perceptual and action systems.  (This hypothesis does not seem implausible to me and even has a certain appeal but Mahon and Caramazza didn’t offer any direct evidence to support it.)

Mahon and Caramazza used our experiments (among others) to stake out their position that cognition involves interactive activation between abstract symbols and the systems of perception and action. In their assessment, our account was too embodied.

Now there is a paper by Wilson and Golonka who state that these same experiments are not embodied enough (there is no pleasing people, is there?). They use them (and others) to advance their claim that cognition does not need mental representations. That pretty much rules out all of cognitive science (including Mahon and Caramazza) but let’s roll with it.

What evidence do Wilson and Golonka provide for their claim? They cite a book by Pfeifer and Schreier. I used this book in a seminar I taught about 10 years ago and greatly enjoyed it. The students and I were fascinated by the descriptions of little robots that could display forms of intelligent behavior purportedly without having mental representations. It is debatable whether these robots really had no mental representations but let’s roll with it.

As the students and I worked through the book, we kept wondering when it was going to address issues cognitive scientists are actually interested in, like understanding language, solving problems, and reasoning. The robots had nothing to say about this. They were mostly rummaging around, industriously collecting empty coke cans.

Once in a while over the past decade, I kept checking back to this literature but it never seemed to come closer to addressing the questions of interest to cognitive scientists.

In fact, Rodney Brooks, the father of the subsumption architectures on which these robots were based, has left the field altogether to become CEO of a company that produces the roomba, a vacuum cleaner with a subsumption architecture. It hides under your couch when you are around but comes out when nobody’s there and cleans your carpet. Very useful of course but a far cry from being able to comprehend a story, write a scientific paper, or reminisce about the past.

Wilson and Golonka discuss some other evidence in support of their claim. They describe interesting research (by others) on how an outfielder catches a fly ball. Obviously, catching a fly ball involves action and perception in the environment.

They also discuss impressive work by Thelen on the A-not-B error. Again, though, the task is very much an “in-the-world task”; it involves perception of and reaching for an object.

Most tasks that cognitive scientists are interested in do not have such strong connections with the environment. The authors, of course, are aware of this and state This is the point where standard cognitive science usually jumps in and claims that conventional meaning requires representational support.

You can hardly blame the standard cognitive scientist for wanting to jump in here because you can drive a truck through the gap in their line of argumentation.

Wilson and Golonka acknowledge that language is a tremendous step up from the other examples they have discussed but are optimistic they can take this step. I can’t see the reason for this optimism because all they provide is a very loose discussion of conventions and situation semantics. I am worried that they are underestimating how tremendous the step really is. 

The authors talk about any cognitive task as solving a problem and leading to an overt response. But I can read Crime and Punishment, I can listen to it on tape, and blind people can read it using Braille. These tasks differ on the surface, but the differences are minor with respect to what cognitive scientists are most interested in in cases like this, namely the end result: a mental representation of the described situation. Reading Crime and Punishment is not the same as catching a fly ball. Calling it solving a problem seems a stretch and it does not lead to an immediately observable response.

The main problem with nonradical embodiment research, according to the authors, is the assumption that cognition is done in the head. Of course it is done in the head. Where else would it be done if you’re sitting on your couch reading Crime and Punishment?

Wilson and Golonka are critical of our 2001 experiment, saying that no task analysis was done. They could have leveled that criticism against pretty much any other cognitive experiment but it’s apparently especially problematic when it concerns ours. But do they have ideas how such an analysis ought to be performed? No. They point out that even Gibson himself (making him sound like the L. Ron Hubbard of ecological psychology) had a hard time coming up with something useful here. So they leave it at a vague critique of our experiment.

That said, I do think the authors have a good point that more attention should be given to task analysis. I just would have liked to get specifics.

The article is generally very low on specifics. Do the authors present any experiments themselves that can convince us that their approach can scale up from catching fly balls to reading Dostoyevsky or constructing an argument? No. Do they at least suggest the outlines of such experiments? Nope.

The article is a manifesto, a rehashing of old ideas that have already been shown to go nowhere (well, they’re hiding under your couch). It provides no roadmap of how we get from catching fly balls to something cognitive scientists care about. It also contains several mischaracterizations of the criticized literature. In fact, there is another recent proposal on the role of embodiment that seems a lot more promising.

That said, to each his own and I see nothing wrong with encouraging Wilson and Golonka to continue on their quest. To have real impact, though, they need an experiment: the kind of experiment that people from various theoretical persuasions will use a decade later as a reference point for framing their own theoretical claims.


Monday, February 11, 2013

Behind the Eiffel Tower


In a previous post I alluded to the fact that I had produced an amusing title a few years ago for an article that was published in Psychological Science (it was intended as a parody on the article titles in that journal). I also mentioned that that article won the Ig Nobel Prize for psychology last year. This prize is awarded for “research that first makes you laugh and then makes you think.”

I thought it might be interesting to describe the creative process behind this paper. I’ll start by saying that I was not a creative force behind the paper, so I’m describing it as a participant observer.

My favorite Beatles song is A Day in the Life. One of the things I like about this song is the sudden switch from the main theme to the middle part (Woke up, fell out of bed…). When I first heard the song, I was amazed. How on Earth did John Lennon and Paul McCartney come up with the idea of switching in the middle of the song to a totally different piece and then back? I had started to write songs myself, one chord at a time. It seemed beyond amazing to invent the transition the Beatles came up with if you used this technique.

I imagined John and Paul sitting together with guitars (just like my friends and me), composing. John had the first part, played it to Paul, who then said Hang on, I’ve got an idea and then started playing the first chord of the next part. They worked on it a bit and then John said Alright, and what if we then we go back to my piece? And Paul said Yeah that’s great, man!

Reality turned out to be a bit different as I discovered when I started reading about the Beatles. In fact, John already had the whole beginning and end but felt something was missing. Paul had been working on something separately that didn’t have a clear beginning or end. And so they decided to combine the two ideas—to great effect.

How is this relevant to the topic of this blog? Over from the sublime to the mundane.

My lab group meets on Friday mornings in a coffee shop. As a side note, in the Netherlands a place like this is tautologically called a coffee café, the term coffee shop having been claimed by establishments where drinking coffee is not high (pun intended) on the agenda.

At any rate, we meet at a coffee shop to discuss the projects that the various lab members are engaged in. We also always discuss a target article of potential interest. Few years ago, we were inspired by the work of Rick Dale and colleagues, who had marshaled wii technology to study cognition and action. The advantage of this technology is that it (1) has high temporal resolution (meaning you can study fast processes) and (2) is pretty damn cheap.

Some master students in our group were building a wii lab together with our lab technicians. At one of the lab meetings the students reported that they had discovered that you could make people believe they were standing straight up when they were actually tilted. A dot representing their center of pressure could be displaced such that when subjects moved it to the center of a crosshairs, their weight was actually slightly shifted to one side. We though this was very cool but decided to put it on the backburner because we couldn’t think of anything to do with it other than making people fall off the balance board, which—though it had a certain appeal—didn’t seem very useful.

Some weeks later, we discussed an article on the so-called SNARC effect (spatial-numerical association of response codes) that showed that people responded faster to smaller numbers presented on the left and larger numbers presented on the right than vice versa (unfortunately I don’t recall which article it was). The topic of this article was outside of our normal focus, which is on language processing, but such excursions often prove informative.

The basic idea is that we mentally represent numbers on a line with smaller numbers on the left and larger numbers on the right. There are many experiments on this topic. In a parity-judgment task (is the number odd or even?) subjects respond faster to a small number when it is presented on the left of a (larger) target number than when it is presented on the right of it and vice versa for larger numbers.

A key conclusion from our discussion of the target article was that, yeah sure, you can make people think of smaller numbers when the number is presented on the left but the subjects are aware that the number is presented on the left. So maybe the whole process is mediated via lexical associations. The location of the target number activates left, which activates small numbers, simply put. A stronger test of the idea would be one that could show this effect without people being aware of left or right. Then serendipity struck.

One of our group members, Anita Eerland (third from the right), remembered the discussion about the wii balance board from a few weeks earlier and said: What if we put people on the balance board while they make estimates of things like the height of the Eiffel Tower? We can then manipulate left or right without them knowing it.

It was outside our normal research domain but it seemed like an interesting question. As we developed the paradigm, it appeared to us that the research had become more than merely an attempt to investigate a potential confound from research on the mental number line. It became a study about estimation.

So by combining two unrelated and not very outlandish ideas, one emanating from a critical discussion of a paper and the other from fiddling around in the lab, there now was an amalgam that unexpectedly led to an Ig Nobel Prize.

I’m not saying this is on a par with writing A Day in the Life, obviously. But sometimes you get very unusual-looking ideas by combining relatively mundane ones.

Wednesday, February 6, 2013

Bruce Springsteen and Lazy Susan: The Logic of Experimentation


Many papers have more than one experiment. How do researchers string together experiments?

When I was a beginning assistant professor, I would tell my students The difficult part is not Experiment 1, it is Experiment 2 (when saying this, I would always picture myself as a grizzled war veteran, smoking a pipe and using its stem to point out locations in the field of battle to my crop of young recruits.)

Why was Experiment 2 so difficult? Well, it couldn’t stray too far from Experiment 1 or reviewers might expose a gap in the causal chain that leads from experiment to experiment. It also couldn’t be too close because then you’d have a direct replication of Experiment 1. (Nowadays we would see this as a virtue but back then it would probably have been viewed as unnecessary padding.)

For many years I thought having a tight chain of experiments was normal for experimental research. But then I started reading articles in social psychology. I did this partly because I was asked as a reviewer (due to my interest in embodied cognition there was some overlap with some work in social cognition, at least in the perception of some editors) and partly because I became chairman of a committee investigating alleged fraud in the work of Dirk Smeesters.

It struck me that quite a few social psychology articles (though most certainly not all) don’t follow the logic of experimentation I just described. The experiments in those papers all relate to a general theme but there is no causal chain. I’ve tried to depict the difference between cognitive and those social experiments below.

Eagles
Youth soccer












In cognitive psychology there tend to be strong causal connections between the individual experiments (the blue boxes). Usually, the first experiment is the main one, relating to the research question (the red box), and the others serve to rule out potential alternative explanations for the results of the main experiment. It’s the Bruce Springsteen model. One experiment is the boss and the others are important but act in a supportive role—the E Street Band. However, sometimes all experiments are equally important to the main research question and then we have the Eagles model. The Bruce Springsteen and Eagles models describe pretty much all of cognitive research.

In the social psychology articles I’m talking about, there does not always appear to be a causal connection between the experiments. They all relate somewhat (hence the thinner arrows in the social model) to the central research question (which is pitched at a higher level than in cognitive articles) but in very different ways. You could pretty much put the experiments in any order and the paper would be the same. I cannot think of a musical analogy but I'm reminded of youth soccer. The experiments are like a bunch of six-year-olds playing soccer. They’re all very cute but they’re not playing as a team.

I don't know how much of social psychology can be explained by the youth-soccer model but I think it's a sizable subset (I also think virtually no cognitive studies conform to this model).

Here is a concrete example. I’m not picking on the authors. I could easily have selected many others and I think the experiments in this paper are very clever. I’m also not going to discuss the research in detail. I’m only interested in how the experiments hang together.

The authors examined whether merely executing and seeing clockwise (vs. counterclockwise) movements would induce psychological states of temporal progression (vs. regression) and according to motivational orientations toward the future and novelty (vs. the past and familiarity).

Here is how they summarize the support for their hypothesis:

Participants who turned cranks counterclockwise preferred familiar over novel stimuli, but participants who turned cranks clockwise preferred novel over old stimuli… (Experiment 1). Also, participants rotating a cylinder clockwise reported higher scores in the personality measure openness to experience than participants rotating counterclockwise (Experiment 2). Merely passively watching a rotating square had similar but weaker effects on exposure and openness (Experiment 3). Finally, participants chose more unconventional candies from a clockwise than from a counterclockwise Lazy Susan, that is, a turntable (Experiment 4).

So Experiment 2 used a different task than Experiment 1 (what happened to the cranks?) and had a different dependent measure. Why not change one thing at a time? Do we really already know what’s going on with the crank task? And why use a self-report task when you had a nice implicit task in Experiment 1? Is indicating agreement with statements like Poetry does not deeply impress me really comparable to judging Chinese ideographs (which is what the subjects did in Experiment 1)?

In Experiment 3, the subjects watched a square whereas they rotated a cylinder (which turns out to be a paper towel roll; I’ll take the cranks anytime) in Experiment 2. The researchers have good reasons for using a square (it doesn’t look like a clock) and the experiment makes sense in light of the previous one. They wanted to disentangle visual and manual rotation. Also good about Experiment 3 is that it uses the dependent measures of Experiments 1 and 2. Still, how do we know that those tasks are measuring the same mechanism? One is implicit and the other requires introspection. One is visual and the other verbal.

But then we get to Experiment 4. Here Lazy Susan makes an entrance and the dependent measure is candy selection (leave it to Lazy Susan to get the party going). Now I am seriously confused, and it’s not because of Lazy Susan’s pretty blue eyes.

The advantage of the Springsteen/Eagles model is that it results in a tightly interwoven set of experiments. Each experiment is often a partial replication of the previous one. But there is the risk of vacuous experimentation; researchers lose sight of the bigger question that motivated the research in the first place. But that is just an experiment about an experiment I have heard myself say during many lab meetings (waving my imaginary pipe). I could mention whole areas of cognitive psychology that are mired in vacuous experimentation, but that’s for a later post.

The advantage of the youth-soccer model is that researchers keep their eye on the bigger question. The disadvantage is that it does not allow them to gather a coherent body of evidence to address this question. And there is another disadvantage: false positives. Suppose the Lazy Susan experiment had “failed.” The researchers could have simply concluded that the task didn’t “work” (damn you, Lazy Susan!) and tried something else. Maybe they would have had people watch a hamster in a treadmill and then judge melodies played by a harpsichord or by a synthesizer.

I don’t know if there are courses on the logic of experimentation. If one were to be developed, it should teach the Bruce Springsteen model and the Eagles model. But it should also tell students not to lose sight of the red box.


Monday, February 4, 2013

Toward a Taxonomy of Article Titles



In previous posts I have talked quite a bit about amusing titles (I promise that my next post will be on something else). I borrowed this term from an article that looked at the effects of such titles on citations and found that “highly amusing titles” were cited 30% less than other titles.

The authors of the article were careful to point out that correlation does not imply causation. Maybe these articles had less to offer contentwise—thus receiving fewer citations—and tried to make up for it in packaging.

We can establish causality in experiments. I reported initial evidence that amusing titles yield lower confidence in the associated results. Admittedly, this evidence is not very strong but it’s a first indicator.

Whether or not titles are amusing is in the eye of the beholder, so “amusing” is not a useful label. I propose to speak of descriptive and elaborative titles. A descriptive title is “A Comparison of Treatment A and Treatment B.” An elaborative title is “Comparing A and B: a Tale of Two Treatments.” It contains an elaboration, which in this case is an allusion to Dicken’s famous novel.

Elaborative titles can be segregated into two categories. There are functional elaborative titles and nonfunctional elaborative titles. The functional titles are illuminating because they shed a different light on the topic of the article whereas the nonfunctional titles make reference to something outside the article and often even outside the realm of science.

An example of a functional elaborative title is one that my graduate student Jacqueline de Nooijer came up with (once in a while I’m allowed to put in a plug for my students, right?). It is When Left Is Not Right: Handedness Effects on Learning Object-manipulation Words Using Pictures with Left or Right-handed First-person Perspectives.” Okay, the second part of the title drags a bit but the main finding of the study is that right-handers learn words associated with tools better when these tools are shown with their handle pointing to the right than pointing to the left. This is nicely captured by the pre-colon part of the title. I will write more about this study in a later post. We recently submitted the manuscript to Psychological Science. I’m hoping its editors aren’t reading this blog.

In the category of nonfunctional elaborative titles we find titles that stay at least within the realm of learning and those that invoke the domains of popular or even street culture.


Highbrow nonfunctional elaborative titles contain allusions to the literary canon (e.g., Shakespeare, Jane Austen, Tolstoy). Although the allusion does not add information to the title, it might lend the research some cultural gravitas. But it is unclear how titles like this will affect the perceived scientific value of the research. They might amplify the value if people “get” the reference and appreciate an appeal to high culture (literati amongst each other). It might have a negative effect if people find the reference gratuitous, highfalutin, or too obvious. After all, there are literally thousands of titles with “A Tale of Two” in the subtitle. See my earlier post on this.

Lowbrow nonfunctional titles are likely to only have a negative impact. They tend to lower the opinion of the associated research by “mucking up” the title. Sometimes quite literally so:


The critical reader will notice that this description can be interpreted literally, as it aptly describes the object under investigation. Nevertheless, the phrase piece of shit evokes a very different context than that of scientific analysis. “They said shit heh heh heh.”

Obviously, a scientific title with the phrase piece of shit in it will garner public attention. So maybe it is not surprising that the thus-titled article has already attracted > 88,000 views on the PLoS Neglected Tropical Diseases site and > 8000 social shares. And perhaps this is exactly what the authors wanted to achieve. But does anyone doubt that the vast majority of these viewers were mainly interested the title?

It would be easy to conclude with a firm: with titles like these, the research is going down the toilet but we really don’t know this. To quote the stereotypical boring article ending: more research is needed. 

Sunday, February 3, 2013

The Actual Results are in!


Good thing I called the previous update “preliminary results!” I discovered later that many subjects had “clicked through” the descriptions, merely reading the titles and spending only a second or so on the descriptions. In my preregistration I had said that I would exclude the data from subjects if their viewing times were < 30 sec for two or more abstracts.

Well, it turned out I would have had to throw away data from more than half the subjects! Therefore I decided to rerun the study but now with a stern warning in the instruction that viewing times would be measured and that data from subjects with impossibly short viewing times would be unusable and that the subjects would not get paid.

This seemed to help. In the second run, far fewer subjects had impossibly short viewing times, although there were still quite a few delinquents (they did not get paid).

The lesson here: first run a sizeable pilot study before you pre-register, dummy!

I overshot a little, but at least I have 206 usable subjects now. These subjects took about 47 seconds on average to read the abstracts, which seems reasonable. There was no difference in viewing times between the amusing and nonamusing conditions.

As a good boy, I’m separating the discussion into a confirmatory part and an exploratory part.

The confirmatory part

People expressed less confidence in the research in the amusing condition than in the nonamusing condition and this difference was significant (p=.006; the bars in the figure represent standard errors). It was a lot smaller than in my original report though, and according to a Bayesian analysis, the evidence for my alternative hypothesis is only about twice as strong as for the Null hypothesis of there not being an effect.

The pattern for interestingness was basically a wash. Numerically, people found the studies with the nonamusing titles more interesting than those with amusing titles (opposite to my prediction, but this difference was not significant (p=.13).

So far for the confirmatory part—on to the exploratory part.

The exploratory part

At the end of the experiment, I asked one true/false question about each study. The question always pertained to the main finding of the study. I had merely included these questions to get a sense of the subjects’ understanding of the abstracts without expecting there to be differences between the conditions. However, the amusing condition yielded a lower proportion of correct answers than the nonamusing condition (.66 vs. .69, p=.013). Although this evidence is at best ambiguous in the land of Bayes, it might not be a bad idea to include a more extensive test of comprehension in future studies.

Limitations: Part I

The effects of amusing titles will likely be more pronounced with researchers in psychology, or with scientists in general. One reason for this is that the experts will be more intrinsically motivated to read the abstracts.

It is also important to note that laypeople may not know some of the technical terms in the nonamusing titles and descriptions, so the amusing title might provide scaffolding (to use a term from educational psychology) for their understanding of the description, whereas it will be ornamental to the experts.

Also, experts may take a more serious view of science than do laypeople, and might therefore be more likely to be put off by amusing titles. But this is an empirical question of course. After all, it is the scientists who generated the amusing titles in the first place!

There was one comment from a subject I found both moving and telling about the current economic situation: “Thank you for easing my unemployment.”

Limitations: Part II

Another limitation were the titles and abstracts. I selected only 12 because I thought this was about as much as the Turkers could handle, which is probably right. I would have been more comfortable with at least 20.

In addition to the number of stimuli, their content is also a potential issue. From perusing many amusing titles and the associated abstracts, I learned that amusing titles come in many variants (more about this in a later post). I didn’t really take this into account and basically selected titles in which the pre-colon part was (somewhat) amusing and did not provide information that the post-colon part did not also provide. This was a judgment call, of course, and as I indicated above, the pre-colon part might not have always been redundant to the subjects.

Another selection criterion was that the abstract should not be too technical (again, a judgment call on my part).

It is quite possible that these criteria have produced a set of titles that is heterogeneous in terms of its amusingness.

Conclusion

The main conclusion is that there is a tendency among laypeople to view evidence from articles with amusing titles somewhat less convincing than the same evidence from the same articles with nonamusing titles.

Where do we go from here?

An experiment with an expert sample would be a good idea (assuming there still are psychologists left who are not readers of this blog;)).

This experiment would have to involve more abstracts to gain more power. More abstracts will be less onerous on the experts that they will be on Turkers.

It might also be a good idea to first perform a careful analysis of types of amusing titles. It is likely that they don’t all have the same effect.

And for the rest I’m open to your suggestions. Please fire away!