Saturday, March 4, 2017

The value of experience in criticizing research

It's becoming a trend: another guest blog post. This time, J.P. de Ruiter shares his view, which I happen to share, on the value of experience in criticizing research.

J.P. de Ruiter
Tufts University

One of the reasons that the scientific method was such a brilliant idea is that it has criticism built into the process. We don’t believe something on the basis of authority, but we need to be convinced by relevant data and sound arguments, and if we think that either the data or the argument is flawed, we say this. Before a study is conducted, this criticism is usually provided by colleagues, or in case of preregistration, reviewers. After a study is submitted, critical evaluations are performed by reviewers and editors. But even after publication, the criticism continues, in the form of discussions in follow-up articles, at conferences, and/or on social media. This self-corrective aspect of science is essential, hence criticism, even though at times it can be difficult to swallow (we are all human) is a very good thing. 

We often think of criticism as pointing out flaws in the data collection, statistical analyses, and argumentation of a study. In methods education, we train our students to become aware of the pitfalls of research. We teach them about assumptions, significance, power, interpretation of data, experimenter expectancy effects, Bonferroni corrections, optional stopping, etc. etc. This type of training leads young researchers to become very adept at finding flaws in studies, and that is a valuable skill to have.  

While I appreciate that noticing and formulating the flaws and weaknesses in other people’s studies is a necessary skill for becoming a good critic (or reviewer), it is in my view not sufficient. It is very easy to find flaws in any study, no matter how well it is done. We can always point out alternative explanations for the findings, note that the data sample was not representative, or state that the study needs more power. Always. So pointing out why a study is not perfect is not enough: good criticism takes into account that research always involves a trade-off between validity and practicality. 

As a hypothetical example: if we review a study about a relatively rare type of Aphasia, and notice that the authors have studied 7 patients, we could point out that a) in order to generalize their findings, they need inferential statistics, and b) in order to do that, given the estimated effect size at hand, they’d need at least 80 patients. We could, but we probably wouldn’t, because we would realize that it was probably hard enough to find 7 patients with this affliction to begin with, so finding 80 is probably impossible. So then we’d probably focus on other aspects of the study. We of course do keep in mind that we can’t generalize over the results in the study with the same level of confidence as in a lexical decision experiment with a within-subject design and 120 participants. But we are not going to say, “This study sucks because it had low power”. At least, I want to defend the opinion here that we shouldn’t say that. 

While this is a rather extreme example, I believe that this principle should be applied at all levels and aspects of criticism. I remember that as a grad student, a local statistics hero informed me that my statistical design was flawed, and proceeded to require an ANOVA that was way beyond the computational capabilities of even the most powerful supercomputers available at the time. We know that full LMM models with random slopes and intercepts often do not converge. We know that many Bayesian analyses are intractable. In experimental designs, one runs into practical constraints as well. Many independent variables simply can’t be studied in a within-subject design. Phenomena that only occur spontaneously (e.g. iconic gestures) cannot be fully controlled. In EEG studies, it is not feasible to control for artifacts due to muscle activity, hence studying speech production is not really possible with this paradigm.

My point is: good research is always a compromise between experimental rigor, practical feasibility, and ethical considerations. To be able to appreciate this as a critic, it really helps to have been actively involved in research projects. Not only because that gives us more appreciation of the trade-offs involved, but also, perhaps more importantly, of the experience of really wanting to discover, prove, or demonstrate something. It makes us experience first-hand how tempting it can be, in Feynman’s famous formulation, to fool ourselves. I do not mean to say that we should become less critical, but rather that we become better constructive critics if we are able to empathize with the researcher’s goals and constraints. Nor do I want to say that criticism by those who have not yet have had positive research experience is to be taken less seriously. All I want to say here is that (and why) having been actively involved in the process of contributing new knowledge to science makes us better critics. 


  1. Great post, thanks. I do think that the opposite is true as well: 'having been actively involved in the process of criticizing other work makes us better at contributing new knowledge'

  2. I'm sympatethic to the general point that the author is trying to make, but the examples of unhelpful criticism are not well chosen.

    A simple solution for the aphasia study with 7 patients is to avoid claims about population and to make inferences about the observed sample or about "other aspects of the study." If the reviewer and reader can "keep in mind that we can't generalize over the results in the study ...", why can't the authors acknowledge the fact by framing their inferences in an adequate way, i.e. in terms of the sample rather than the population.

    The EEG example and many of the real-world criticisms are concerned with causal indentifiability. The tools of causal inference (e.g. the work by Pearl or by Rubin) tell us not only how to compute certain causal effects but also how to determine whether some causal effect is identifiable based on the knowledge of the data-generating mechanism. We can then go other way round and reason about how to desing an experiment or an intervention in order to expose certain mechanism and to make its effect identifiable. In psychological research these tools are never used and psychologists instead implicitly rely on their intuitions about causation. Such approach is informal, intransparent and error-prone and as such not very scientific. One such erronous intuition comes up in the examples. To identify a causal effect it is not necessary to control for confounders experimentally (e.g. by holding these constant). It is often sufficient to measure the confounders as covariates and then adjust for them in the analysis. This is for instance done to compensate for the blinks in the EEG data. One can imagine a similar adjustment in an EEG study of language production and I wouldn't be surprised if there already were research tackling this problem.

    In addition the examples illustrate, that often the issue at the heart of the criticism is not that the authors fail to apply some method or make questionable assumptions. The issue is that the authors fail to acknowledge the limitations of their methods and their assumptions and to frame their conclusions accordingly.

    As for the general case of unhelpful criticism, IMO this mainly arises when the critics fail to prioritize issues and bring up issues that do not affect the general conclusions and as such are of secondary importance. Even when it is desirable to list all issues, such as in a review, it is then advisable to structure the review so that it highlights the severity of the reported issues (see for instance the recommendations by Jeff Leek: