Good thing I called the previous update “preliminary results!” I discovered later that many subjects had “clicked through” the descriptions, merely reading the titles and spending only a second or so on the descriptions. In my preregistration I had said that I would exclude the data from subjects if their viewing times were < 30 sec for two or more abstracts.
Well, it turned out I would have had to throw away data from more than half the subjects! Therefore I decided to rerun the study but now with a stern warning in the instruction that viewing times would be measured and that data from subjects with impossibly short viewing times would be unusable and that the subjects would not get paid.
This seemed to help. In the second run, far fewer subjects had impossibly short viewing times, although there were still quite a few delinquents (they did not get paid).
The lesson here: first run a sizeable pilot study before you pre-register, dummy!
I overshot a little, but at least I have 206 usable subjects now. These subjects took about 47 seconds on average to read the abstracts, which seems reasonable. There was no difference in viewing times between the amusing and nonamusing conditions.
As a good boy, I’m separating the discussion into a confirmatory part and an exploratory part.
The confirmatory part
The pattern for interestingness was basically a wash. Numerically, people found the studies with the nonamusing titles more interesting than those with amusing titles (opposite to my prediction, but this difference was not significant (p=.13).
So far for the confirmatory part—on to the exploratory part.
The exploratory part
At the end of the experiment, I asked one true/false question about each study. The question always pertained to the main finding of the study. I had merely included these questions to get a sense of the subjects’ understanding of the abstracts without expecting there to be differences between the conditions. However, the amusing condition yielded a lower proportion of correct answers than the nonamusing condition (.66 vs. .69, p=.013). Although this evidence is at best ambiguous in the land of Bayes, it might not be a bad idea to include a more extensive test of comprehension in future studies.
Limitations: Part I
The effects of amusing titles will likely be more pronounced with researchers in psychology, or with scientists in general. One reason for this is that the experts will be more intrinsically motivated to read the abstracts.
It is also important to note that laypeople may not know some of the technical terms in the nonamusing titles and descriptions, so the amusing title might provide scaffolding (to use a term from educational psychology) for their understanding of the description, whereas it will be ornamental to the experts.
Also, experts may take a more serious view of science than do laypeople, and might therefore be more likely to be put off by amusing titles. But this is an empirical question of course. After all, it is the scientists who generated the amusing titles in the first place!
There was one comment from a subject I found both moving and telling about the current economic situation: “Thank you for easing my unemployment.”
Limitations: Part II
Another limitation were the titles and abstracts. I selected only 12 because I thought this was about as much as the Turkers could handle, which is probably right. I would have been more comfortable with at least 20.
In addition to the number of stimuli, their content is also a potential issue. From perusing many amusing titles and the associated abstracts, I learned that amusing titles come in many variants (more about this in a later post). I didn’t really take this into account and basically selected titles in which the pre-colon part was (somewhat) amusing and did not provide information that the post-colon part did not also provide. This was a judgment call, of course, and as I indicated above, the pre-colon part might not have always been redundant to the subjects.
Another selection criterion was that the abstract should not be too technical (again, a judgment call on my part).
It is quite possible that these criteria have produced a set of titles that is heterogeneous in terms of its amusingness.
The main conclusion is that there is a tendency among laypeople to view evidence from articles with amusing titles somewhat less convincing than the same evidence from the same articles with nonamusing titles.
Where do we go from here?
An experiment with an expert sample would be a good idea (assuming there still are psychologists left who are not readers of this blog;)).
This experiment would have to involve more abstracts to gain more power. More abstracts will be less onerous on the experts that they will be on Turkers.
It might also be a good idea to first perform a careful analysis of types of amusing titles. It is likely that they don’t all have the same effect.
And for the rest I’m open to your suggestions. Please fire away!