Wednesday, March 11, 2015

The End-of-Semester Effect Fallacy: Some Thoughts on Many Labs 3

The Many Labs enterprise is on a roll. This week, a manuscript reporting Many Labs 3 materialized on the already invaluable Open Science Framework. The manuscript reports a large-scale investigation, involving 20 American and Canadian research teams, into the “end-of-semester effect.”

The lore among researchers is that subjects run at the end of the semester provide useless data. Effects that are found at the beginning of the semester somehow disappear or become smaller at the end. Often this is attributed to the notion that less-motivated/less-intelligent students procrastinate and postpone participation in experiments until the very last moment. Many Labs 3 notes that there is very little empirical evidence pertaining to the end-of-semester effect.

To address this shortcoming in the literature, Many Labs 3 set out to conduct 10 replications of known effects to examine the end-of-semester effect. Each experiment was performed twice by each of the 20 participating teams: once at the beginning of the semester and once at the end of the semester, each time with different subjects, of course.

It must have been a disappointment to the researchers involved that only 3 of the 10 effects replicated (maybe more about this in a later post) but Many Labs 3 remained undeterred and went ahead to examine the evidence for an end-of-semester effect. Long story short, there was none. Or in the words of the researchers:

It is possible that there are some conditions under which the time of semester impacts observed effects. However, it is unknown whether that impact is ever big enough to be meaningful

This made me wonder about the reasons for expecting an end-of-semester effect in the first place. Isn’t this just a fallacy born out of research practices that most of us now frown upon: running small samples, shelving studies with null effects, and optional stopping?

New projects are usually started at the beginning of a semester. Suppose the first (underpowered) study produces a significant effect. This can have multiple reasons:
(1) the effect is genuine;
(2)  the researchers stopped when the effect was significant;
(3) the researchers massaged the data such that the effect was significant;
(4) it was a lucky shot;
(5) any combination of the above.

How the end-of-semester effect might come about
With this shot in the arm, the researchers are motivated to conduct a second study, perhaps with the same N and exclusionary and outlier-removal criteria as the first study but with a somewhat different independent variable. Let’s call it a conceptual replication. If this study, for whatever reason, yields a significant effect, the researchers might congratulate themselves on a job well done and submit the manuscript.

But what if the first study does not produce a significant effect? The authors probably conclude that the idea is not worth pursuing after all, shelve the study, and move on to a new idea. If it’s still early in the semester, they could run a study to test the new idea and the process might repeat itself.

Now let’s assume the second study yields a null effect, certainly not a remote possibility. At this juncture, the authors are the proud owners of a Study 1 with an effect but are saddled with a Study 2 without an effect. How did they get this lemon? Well, of course because of those good-for-nothing numbskulled students who wait until the end of the semester before signing up for an experiment! And thus the the “end-of semester fallacy” is born.


  1. I love it. Great post.

  2. Have you read the recent study by Nicholls et al. in Quarterly Journal of Experimental Psychology that sustained attention and motivation is higher early in the semester? 10.1080/17470218.2014.925481

    1. The Many Labs 3 manuscript cites that paper and replicates the effect you mention.

    2. Thank you for your reply. I have written down some thoughts about this post and both studies:

  3. Hi--

    I just looked over the supplemental materials and noticed that of the three effects that replicated (Stroop, metaphoric structuring, and availability heuristic) there WAS a time of semester effect for 2 of the 3 effects. The effect on Stroop, the most robust effect (and thus the most relevant to address the time of semester question), does indeed show that the Stroop effect is strongest at the end of the term. The effect is not large by any means, but don't these effects actually confirm the presence of a time-of-semester effect?