Wednesday, January 23, 2013

Proposing a study on the effects of amusing titles

Update 1 after feedback from @hansijzerman on Twitter
Update 2 after comments from Thomas Schubert (see below) and Steve Fiore (via Facebook)
Final update after comments from Eefje Rondeel and Michał Parzuchowski (see below) as well as some further thinking on my part.
Final final update. After a test-run, I decided to present titles first and then title-plus-abstract. This makes it harder to ignore the titles and provides me with RTs on the titles. 

In my last post I showed that amusing titles are on the rise in Psychological Science. It reached its highest proportion of amusing titles (PAT) in 2012: .41. Similar PATs can be found in recent volumes of Social Psychological and Personality Science and the Journal of Experimental Social Psychology.

I am interested to see what the effects of these titles are on the impression that readers have of the associated research. I am therefore proposing to examine this question empirically in an online experiment. In doing this study, I will combine various things that I have talked about in this blog in the past few weeks.
  •          I hereby preregister the design and sampling plan of the experiment.
  •          I will use Mechanical Turk to collect the data.
  •          I will post the data on the Open Science Framework.
  •          I will discuss the results in a subsequent post.

So here is the proposed study.

Hypothesis. Amusing titles lower the confidence in the associated research but may enhance its newsworthiness or perceived interestingness. (This is the hypothesis that seems to be implied by what I have been writing. I’m not sure I completely believe it, but that’s why we do research.)

Subjects. Two-hundred subjects will be recruited from Mechanical Turk. (This is not ideal, as one could argue that MTurk subjects are not the typical readers of psychology articles.)

Stimuli. The stimuli are 12 article titles plus their associated abstracts taken from Psychological Science. There are two versions of each title-abstract pair, one with an amusing title and one with a non-amusing title. Non-amusing versions of titles were created by removing the pre-colon part of the title. An example is:
Something in the Way She Sings: Enhanced Memory for Vocal Melodies. 
The non-amusing version is:
Enhanced Memory for Vocal Melodies. 

After each title-abstract pair, subjects will be asked to indicate their level of (1) confidence and (2) interest in the results of the associated study by using a slider showing which yields values from 1-10 (and shows a pretty nifty thermometer). 

There is also one true-false statement per article. This statement summarizes the main findings of the study or states the opposite of the main findings. 

Design. Type of title will be counterbalanced across subjects such that each subject will see six pairs with amusing titles and six without and each version will be seen by an equal number of subjects. This yields a 2 (title condition) X 2 (counterbalancing list) X 2 (question type: confidence vs. interest) design, with title and question type as within-subjects factors and list as a between-subjects factor.

Titles were selected such that the pre-colon part was judged not to provide additional information relative to the post-colon part. Abstracts were selected that had a minimum amount of technical terms, making them suitable for a lay audience.

Psychological Science has varied over the years how it presents titles. Overall, the pre-colon part has been emphasized over the post-colon part. Early on, the pre-colon part of the title was in all caps whereas the post-colon part was in title case. Later on, title case was used throughout but the pre-colon part was printed in a larger font than the post-colon part. In recent years, the same font size has been used for both sides of the colon. The present experiment, presents the pre-colon part in a larger font (18) than the post-colon part (14), which is consistent with what the full-text looks like on the Psych Science site, e.g., here. In the non-amusing condition, the post-colon part is presented in 18 font.

The true/false judgments are included mainly to gauge the subjects' understanding of the articles. One might hypothesize that amusing titles lead to lower scores than non-amusing titles (I have no idea at this point how difficult this task will be for the subjects, so there might be a floor effect). 

Procedure. Subjects will be instructed that they are to judge the scientific value of 12 psychology studies based on short descriptions of the research that was performed. Each subject will see the 12 studies in a different random order. In each trial, subjects will first see a screen with the title. The next screen will consist of the title of the study plus the abstract, Viewing times for each screen will be measured. The final screen presents the two statements asking subjects about their level of confidence and interest in the study, respectively. Subjects will use sliders to indicate their confidence and interest.

Next, the subjects will judge the twelve true/false statements (in a different random order for each subject). Finally, they will fill out a demographic questionnaire asking about age, gender, native language, education, and interest in and knowledge of psychological research. 

Subject exclusion. Data from nonnative speakers of English will be excluded. Data from the last-run subject(s) of the longest list will be discarded to ensure that the counterbalancing lists are of equal length. In addition, data from subjects will be removed if their viewing times for 2 or more abstracts are < 30 sec.

Data exclusion. If the viewing time is below 30 sec, the associated trial will be eliminated because this time is far too short to meaningfully read the abstract.

Analysis. The data will be analyzed using analysis of variance. The main prediction is that amusing titles yield lower confidence ratings but higher attention-worthiness ratings than non-amusing titles, in other words an interaction between condition and question type. Follow-up analyses are planned. Bayes factors will be computed to guard against false positives.

I expect to have the results sometime next week.


  1. nice :)

    Length of title is a possible confound here. To test for that, you can regress judged quality on total title length for the non-amusing titles.

    Another problem might be that you remove important information when taking away the amusing part. Sometimes they elaborate on the other half of the title. To counteract this, you could invent amusing intros to titles that are originally not amusing (or ask others who are blind to your hypothesis to do so), and add those to the design.

    Finally, your items are randomly drawn from the population of titles, so you should not simply average them for ANOVA. Instead, you can analyse this using a MIXED model, where you add item as a random factor (see Judd et al., JPSP, 2012).

  2. Thanks for your useful comments!

    Length might indeed be an issue. On the other hand, it is more-or-less a built-in confound. An option would be to go with mostly non-informative title.

    Removing important information is a more serious issue. This requires selecting titles where the amusing part is merely an embellishment. Or of course go with the option I just described.

    I 'll consult with statisticians on this, given that the opinions on mixed models seem to be mixed.

  3. Recently heard a graduate student talk at RPI (Troy, NY) comparing Turk to laboratory results in a simple RT, go/no-go, and stroop task. The student simultaneously collected data on Turk and in the lab and found that in some cases (simple RT) results were very similar between the two methods, but in other cases (go/no-go, stroop) results were significantly different in several ways. People were slower, data showed more variability, and some of the classic laboratory effects were not found.

    The talk was particularly amusing because the student had a survey at the end, asking participants to be completely honest about their environment (who was around, what they were doing other than the HIT, etc). One participant said they were on their laptop during a boring lecture, one was competing with their friends to see who could complete more HITs soonest, and one claimed to be smoking an illicit substance during the study. These are just a few of the most amusing ones, but you get the picture.

    Maybe this isn’t exactly surprising given the nature of Turk, but the talk made me realize that many researchers are cutting corners with Turk and perhaps not realizing how dramatically sample and context can affect data (what appear to be basic lessons in research methods). Alas, it does sound like an interesting study and I’m looking forward to reading the results here.

  4. These things do happen but they also happen in the lab, see my earlier post Our experience is that MTurk subjects are more conscientious than our normal subjects but RTs tend to be longer, yes.

  5. This research proposal sounds like an excellent idea and I already look forward to the results! You might consider to also ask people about how likely they are to share the research results with friends or family (not colleagues, as this will depend on the research topic), to gain more insight in the newsworthiness.

  6. Thanks! That's a good suggestion. I'll include it in a subsequent update.

  7. Great idea and very nice procedure. I have a small comment to the procedure that you might consider. You want to ask lay-readers if the title (amusing or not) was "worthy of their attention". This kind of declared level of attention grabbing factor might be a proxy for their motivation to read the abstract, but it does not tell you about their attention per se. Perhaps you could also measure the level of attention given to the content of research covered. You could perhaps run some sort of a memory test (might be an openended listing task, if the title database is too big to prepare a proper test) at the end of the procedure, to check if P's remembered the results they have rated (and specifically see if they remembered more from the ones they have actually rated to be worthy of their attention; accounting for order of presentation). This memory test could be also used as a second manipulation check if P's have actually read and understood given abstracts.

  8. Thanks.

    Good point! I have just selected 12 abstracts I am going to use and am creating two yes/no questions for each. In addition, I will be collecting viewing times for each abstract, which also provides information on the amount of attention that people pay. Future experiments might focus more on the memory effects of amusing titles. My hunch is that they do not lead to better memory for the contents of the article but rather to better memory for themselves (which may be useful because people will be more likely to retrieve the article and check the contents).