In my last post, I described a (mostly) successful replication by
Steegen et al. of the ”crowd-within
effect.” The authors of that replication effort felt that it would be nice
to mention all the good replication research practices that they had implemented in their replication effort.
And indeed, positive psychologist that I am, I would be
remiss if I didn’t extol the virtues of the approach in that exemplary
replication paper, so here goes.
Make sure you have
sufficient power.
We all know this, right?
Preregister your
hypotheses, analyses, and code.
I like how the replication authors went all out in
preregistering their study. It is certainly important to have the proposed
analyses and code worked out up front.
Make a clear
distinction between confirmatory and exploratory analyses.
The authors did here exactly as the doctor, A.D. de Groot
in this case, ordered. It is very useful to perform exploratory analyses
but they should be separated clearly from the confirmatory ones.
Report effect sizes.
Yes.
Use both estimation
and testing, so your data can be evaluated more broadly, by people from
different statistical persuasions.
Use both frequentist and Bayesian analyses.
Yes, why risk being pulled
over by a Bayes trooper or having a run-in with the Frequentist militia? Again, using multiple analyses allows your results to be evaluated more broadly.
Adopt a co-pilot multi-software approach.
A mistake in data
analysis is easily made and so it makes sense to have two or more researchers
analyse the data from scratch. A co-author and I used a co-pilot approach as
well in
a recent paper (without knowing the cool name for this approach, otherwise
we would have bragged about it in the article). We discovered that there were
tiny discrepancies between our analyses with each of us making a small error
here and there. The discrepancies were easily resolved but the errors probably would have gone undetected had we not used the co-pilot approach. Using a multi-software
approach seems a good additional way to minimize the likelihood of errors.
Make the raw and processed data available.
When you ask people to share their data, they typically send you the
processed data but the raw data are often more useful. The combination is
even more useful as it allows other researchers to retrace the steps from raw
to processed data.
Use multiple ways to assess replication success.
This is a good idea in
the current climate where the field has not settled on a single method yet. Again,
it allows the results to be evaluated more broadly than with a single-method
approach.
Maybe these methodological strengths are worth
mentioning too?, the first author of
the replication study, Sara Steegen, suggested in an email.
Check.
I thank Sara Steegen for feedback on a previous version of this post.
Check.
I thank Sara Steegen for feedback on a previous version of this post.
How many of these are not also good practices for non-replication studies?
BeantwoordenVerwijderenJoking aside: The "co-pilot multi-software" approach is a good point that should be made more widely. Interpreting results is increasingly a question of trusting that the authors have operated (and, in the case of R or Mplus, etc, programmed) the software correctly. Presumably with SPSS and SAS we have fewer arithmetic errors than in the days of log tables and slide rules, but I wonder if we don't perhaps have more methodological errors; those software packages will spit out some number, any number, more or less regardless of what data you throw at them, and whether those data are in the right columns.
In an article which I currently have in press (an unsuccessful reproduction of results from a published dataset; the original results were caused by some spectacular errors in the authors' statistical analyses), I got two independent researchers to reproduce my numbers, starting with the original raw data, the original article, and a brief explanatory e-mail from the original author saying how the method worked. So either we're right, or we're all making the same mistakes. (That last sentence is, of course, a good general description of any snapshot of the state of science!)
While doing that, however, we discovered some interesting discrepancies between tools. For example, if you have a missing value for one of the IVs in a regression in SPSS or R, the entire record for that subject will be ignored, whereas Matlab will, by default, silently insert a zero.
Yes, leave the jokes to me, will you? ;)
VerwijderenI'm glad you're endorsing the "co-pilot multi-software" approach. It is also a way to minimize confirmation bias. A mistake that produces a pattern in line with the hypotheses will be less likely to be detected than one that destroys a predicted effect.