Saturday, December 31, 2016

A Commitment to Better Research Practices (BRPs) in Psychological Science

On the brink of 2017. Time for some New Year's resolutions. I won't bore you with details about my resolutions to (1) again run 1000k (not in a row of course), (2) not live in a political bubble, (3) be far more skeptical about political polls, (4) pick up the guitar again, (5) write more blog posts, and (6) learn more about wine. Instead I want to focus on some resolutions about research practices that Brent Roberts, Lorne Campbell, and I penned (with much-appreciated feedback from Brian Nosek, Felix Sch√∂nbrodt, and Jennifer Tackett). We hope they form an inspiration to you as well. 

The Commitment

Scientific research is an attempt to identify a working truth about the world that is as independent of ideology as possible.  As we appear to be entering a time of heightened skepticism about the value of scientific information, we feel it is important to emphasize and foster research practices that enhance the integrity of scientific data and thus scientific information. We have therefore created a list of better research practices that we believe, if followed, would enhance the reproducibility and reliability of psychological science. The proposed methodological practices are applicable for exploratory or confirmatory research, and for observational or experimental methods.
1. If testing a specific hypothesis, pre-register your research, so others can know that the forthcoming tests are informative.  Report the planned analyses as confirmatory, and report any other analyses or any deviations from the planned analyses as exploratory.
2. If conducting exploratory research, present it as exploratory. Then, document the research by posting materials, such as measures, procedures, and analytical code so future researchers can benefit from them. Also, make research expectations and plans in advance of analyses—little, if any, research is truly exploratory. State the goals and parameters of your study as clearly as possible before beginning data analysis.
3. Consider data sharing options prior to data collection (e.g., complete a data management plan; include necessary language in the consent form), and make data and associated meta-data needed to reproduce results available to others, preferably in a trusted and stable repository. Note that this does not imply full public disclosure of all data. If there are reasons why data can’t be made available (e.g., containing clinically sensitive information), clarify that up-front and delineate the path available for others to acquire your data in order to reproduce your analyses.
4. If some form of hypothesis testing is being used or an attempt is being made to accurately estimate an effect size, use power analysis to plan research before conducting it so that it is maximally informative.
5. To the best of your ability maximize the power of your research to reach the power necessary to test the smallest effect size you are interested in testing (e.g., increase sample size, use within-subjects designs, use better, more precise measures, use stronger manipulations, etc.). Also, in order to increase the power of your research, consider collaborating with other labs, for example via StudySwap. Be open to sharing existing data with other labs in order to pool data for a more robust study.
6. If you find a result that you believe to be informative, make sure the result is robust.  For smaller lab studies this means directly replicating your own work or, even better, having another lab replicate your finding, again via something like StudySwap.  For larger studies, this may mean finding highly similar data, archival or otherwise, to replicate results. When other large studies are known in advance, seek to pool data before analysis. If the samples are large enough, consider employing cross-validation techniques, such as splitting samples into random halves, to confirm results. For unique studies, checking robustness may mean testing multiple alternative models and/or statistical controls to see if the effect is robust to multiple alternative hypotheses, confounds, and analytical approaches.
7. Avoid performing conceptual replications of your own research in the absence of evidence that the original result is robust and/or without pre-registering the study.  A pre-registered direct replication is the best evidence that an original result is robust.
8. Once some level of evidence has been achieved that the effect is robust (e.g., a successful direct replication), by all means do conceptual replications, as conceptual replications can provide important evidence for the generalizability of a finding and the robustness of a theory.
9. To the extent possible, report null findings.  In science, null news from reasonably powered studies is informative news.
10. To the extent possible, report small effects. Given the uncertainty about the robustness of results across psychological science, we do not have a clear understanding of when effect sizes are “too small” to matter.  As many effects previously thought to be large are small, be open to finding evidence of effects of many sizes, particularly under conditions of large N and sound measurement.
11. When others are interested in replicating your work be cooperative if they ask for input. Of course, one of the benefits of pre-registration is that there may be less of a need to interact with those interested in replicating your work.
12. If researchers fail to replicate your work continue to be cooperative. Even in an ideal world where all studies are appropriately powered, there will still be failures to replicate because of sampling variance alone. If the failed replication was done well and had high power to detect the effect, at least consider the possibility that your original result could be a false positive. Given this inevitability, and the possibility of true moderators of an effect, aspire to work with researchers who fail to find your effect so as to provide more data and information to the larger scientific community that is heavily invested in knowing what is true or not about your findings.

We should note that these proposed practices are complementary to other statements of commitment, such as the commitment to research transparency. We would also note that the proposed practices are aspirational.  Ideally, our field will adopt many, of not all of these practices.  But, we also understand that change is difficult and takes time.  In the interim, it would be ideal to reward any movement toward better research practices.

Brent W. Roberts
Rolf A. Zwaan
Lorne Campbell

Wednesday, September 28, 2016

Invitation to a Registered Replication Report

Update December 17. Data collection is in full swing in labs from Buenos Aires to Berkeley and from Potsdam to Pittsburgh. Some labs have already finished while others (such as my lab) have just started. Data collection should be completed by March 1. 

Update October 24. Data collection has officially started. No fewer than 20 labs are participating! Besides investigating if the ACE replicates in native speakers, we will also examine if the effect extends to L2 speakers of English.

Mike Kaschak and Art Glenberg, discoverers of the famous ACE effect, have decided to run a registered replication of their effect. There already are 7 participating labs but we'd like to invite more participants. If you're interested in language, action, and/or replication and have access to subjects who are native speakers of English, please consider participating by responding to Mike's ( invitation:
Dear Colleague,

I am writing to ask whether you are interested in participating in the data collection for a multi-lab, pre-registered replication of the Action-sentence Compatibility Effect (ACE), first reported in Glenberg and Kaschak (2002). I am organizing this effort along with my colleagues Art Glenberg, Rolf Zwaan, Richard Morey, Agustin Ibanez, and Claudia Gianelli.

Your participation in this effort will involve running the Borreggine & Kaschak (2006) version of the ACE experiment (which uses spoken sentences, rather than written sentences as in Glenberg & Kaschak, 2002), following the registered protocol and sampling plan. We will provide the E Prime files required to conduct the study. Our current plan is to complete the preparations for the replication within the next month or so, with data collection to commence in the Fall of 2016, and continue through the Spring of 2017. All data collection should be completed by March 1, 2017 (if not sooner), and all data should be made available to us by April 1, 2017 (if not sooner). You will be expected to analyze the data you collect according to the registered protocol, and also to send us your raw data for analysis and eventual deposit in a public repository.

Because we are aware that different labs face different constraints with regard to the availability of research participants, our sampling plan will be as follows. If you agree to participate, we ask that you commit to collecting data from at least 60 participants, with a maximum sample size of 120 participants. We also ask that you pre-register your chosen sample size with us (sample sizes in multiples of 4, due to the counterbalancing involved in the study) before you begin data collection. We will post the sample sizes along with our pre-registration of the replication methods.

The protocol for the study and the E Prime files will be made available on the Open Science Framework.

All contributors to the data collection effort will be included as authors on the published report of the replication (as in previous published registered replications).

Thank you for considering our request. Please let us ( know as soon as you can whether you are willing to join our effort.

Michael Kaschak

Thursday, May 12, 2016

Disentangling Reputation from Replication

With increasing attention paid to reproducibility in science, a natural worry for researchers is, “What happens if my finding does not replicate?” With this question, Charles Ebersole, Jordan Axt, and Brian Nosek open their new article on perceptions of noveltyand reproducibility, published today in PLoS Biology.

There are several ways to interpret this question, but Ebersole and colleagues are most concerned with reputational issues. In an ideal world, they note, reputations shouldn’t matter; the focus should be on the findings. But reality is different: findings are treated as possessions.

Ebersole and his co-authors draw a contrast between innovation and reproducibility in evaluating reputations. Drawing this contrast is not without precedent. Some years back, I served on the National Science Foundation program Perception, Action, and Cognition. We were told that innovation was to be an overriding criterion in evaluating proposals. Up to that point, as I understood it, the program’s predecessor had been perceived as an “old-boys-network” in which researchers who had been funded before pretty much had a ticket to renewed funding, whereas younger researchers were struggling to get in on the funding. In our program discussions the word “solid” in a review was a kiss of death for the proposal, it being a code word for “more of the same old boring stuff.”

In the last decade, we have seen the pendulum switch from “solid” to “innovative.”* The pendulum metaphor invites the idea to align reproducible with boring and nonreproducible with innovative. Ebersole and colleagues create this stark contrast in their survey. Enter AA and BB, two scientists in some unspecified field. AA produces “boring but certain” results; BB produces “exciting but uncertain” results. Ebersole and colleagues asked two large samples from the general public several questions about these scientific opposites. When presented with this stark choice the general public clearly preferred AA over BB. Good for AA.

However, Ebersole and his co-authors are quick to point out that AA and BB are caricatures; after all, nobody embarks on a career to produce boring or uncertain results. The contrast is misleading because there are temporal dependencies at play. You first obtain an exciting finding and then you decide what to next: replicate and extend this exciting finding or move on to the next exciting finding? And if our reputation is at stake, how should we respond when others attempt to replicate our findings to increase certainty independently?

The authors investigated these questions in a further survey featuring the researchers X and Y. The respondents read several scenarios involving X and Y after having received an introduction about the scientific publication process. The respondents first rated researcher X’s ability, ethics, and the level of truth of the finding.  The average rating of the researcher’s ability was then used as a baseline for several scenarios that introduced researcher Y as someone who replicated or failed to replicate X’s original finding. Of interest were the reputational consequences of this for X. This figure displays the results.

I have to admit that the figure is giving me bouts of OCD (am I alone in feeling compelled to pull apart the superimposed letters?), but the message is clear. Reputation depends not so much on whether your finding is true but rather on how you respond to failed replication.
If Y does not replicate the finding, then the original result is perceived as less true. X suffers some reputational damage as well, being perceived as somewhat less ethical and less capable than before. However, what matters crucially is how X responds to the failed replication. For example, there is considerably more reputational damage if X discredits Y’s replication result. I suspect this would vary as a function of whether or not X’s criticism was perceived as justified, but this was not investigated. In contrast, there is a big reputational gain if X accepts Y’s result (see here for an actual example) and concludes that the original result might not be correct; the original effect is perceived as less true, of course. Interestingly, the finding is perceived as less true than when X criticizes the replication. The reputation gain is even bigger if X starts a replication attempt to investigate the difference between the original and replication results. Curiously, the original result is now perceived as truer than before the failed replication. The reputation gain is somewhat smaller than this if X fails to self-replicate the original finding and the original finding is perceived as less true. There is considerable reputational damage if X performs an unsuccessful self-replication and decides not to report it or doesn’t follow up on the finding at all. The former is a bit hypothetical, of course, because if X doesn’t report the failed self-replication, no one is the wiser. And if X doesn’t follow up, it is unclear whether people would pick up on the lack of a follow-up.

So much for the general public. How about students and scientists? Ebersole and colleagues presented the same scenarios to 428 students and 313 researchers (from graduate students to full professors). It turns out that scientists are more forgiving than the general public, especially when it comes to pursuing new ideas rather than following up on a initially published finding. The authors attribute this to the aforementioned drive toward innovation.

Not surprisingly, the researchers displayed a more realistic (pessimistic?) assessment of the current job market than the general population. They viewed the exciting, uncertain scientist as more likely to get a job, keep a job, and be more celebrated by wide margins.

“Despite that,” the authors note, “researchers were slightly more likely to say that they would rather be, and more than twice as likely to say that they should be, the boring, certain scientist.” Demand characteristics are likely to have played a role here. As I said earlier, who wants to be boring? The students responded more like the general public than like the scientists.

What do we make of this set of results? Clearly, it is quite artificial to presenting respondents with a set of idealized and decontextualized scenarios. On what basis are respondents making judgments when presented with these scenarios? Especially the general public. On the other hand, the convergence among the responses from the three different groups (general public, students, researchers) is reassuring.

The set of scenarios that was used is not only idealized but also limited. It does not exhaust the space of possible scenarios, as the authors acknowledge. For example, there is no scenario that involves a (failed) replication that is flawed because it distorts or omits (either accidentally or intentionally) parts of the original experiment. It would be important to include such a scenario in a follow-up study and then ask questions about the ability and ethics of the replicator and truth of the replication finding as well. After all, just as original experiments can be flawed, so can replications. So it only makes sense to approach replications critically.

What I take away from the article is this.

(1) We should disentangle reputation from replication. This becomes easier if we self-replicate.

(2) We should stop seeing innovation and replication as opposites. The drive to innovate means that we are bound to pursue wrong leads in most cases. Competently performed replications are a reality check. Innovation and replication are not enemies. They are two necessary components of the best mechanism at our disposal to learn about the world: science.


*Although some might see this as the main reason for the reproducibility crisis, the only way we can tell for sure is if there are more replication attempts of “boring” research. I’m willing to bet that there are considerable reproducibility issues with that kind of research as well.

Tuesday, April 19, 2016

Credit, Workload, Accountability, & Fear: Opinions About Open Review

Update May 11, 2016. In a talk, given at Psychonomics in Granada Spain on Saturday, May 7, I discuss the contents of this and the previous post in a symposium on open science, organized by Richard Morey. My talk starts at 43:50. The other talks are definitely worth a watch.

Last week, I reported some quantitative analyses of my open-review survey. In this post I am going to focus on the respondents’ written sentiments regarding open reviews from the perspective of a reviewer.

Many respondents provided written motivations for whether they disagreed or agreed with the statement “as a reviewer, I'd like to have my review published along with the accepted paper.” They could also indicate whether they agreed only if their review would remain anonymous. A large majority (72%) indicated that they would like to see their review published. Forty-five percent of these respondents (87 out of 195) indicated that they only wanted to have their review published if they could remain anonymous.

I then divided the respondents into two groups: tenured and untenured. The distribution of responses over the three answer categories (disagree, agree provided anonymous, agree) differed for the two groups. Perhaps most surprisingly, a much larger percentage of tenured respondents (41%) than untenured respondents (22%) were against having their reviews published. Also interesting, though perhaps less surprising, was that a much larger percentage of untenured (39%) than tenured (18%) respondents wanted to have their reviews published only if their reviews were anonymous. About equal percentages were in favor of having signed reviews published.

What are the motivations behind these numbers? To examine this, I grouped the written responses into several (often related) categories. Sometimes respondents provided multiple reasons. For this post, I decided to go with the first reason provided.* Let’s first look at the motivations provided for not wanting reviews to be published. Not every respondent provided written responses. The two bottom rows show the total number of written responses as well as the total number of responses.

"different audience"
"too much work"
"don't see the relevance"
"I fear retribution"

Total written responses
Total responses

For both groups the most prevalent response for not wanting to see reviews published was the view that reviews are intended for the authors/editor and not for a broader audience. As one respondent stated: 
“If I want to publish a commentary, I will. But reviews are constructive feedback designed to help improve research articles: Why would the review be interesting once the original paper has been revised to address whatever concerns emerged in the review?”
This respondent distinguishes between reviews and commentaries and thinks each has a different role (and audience) in scientific discourse. A related concern, especially among untenured respondents, was that getting reviews in a publishable format is a lot of work, especially if you’re not a confident writer. Here is how one respondent put it:
“Writing reviews already takes too much time. If I know that it will be published, I will care too much about making sure it's free of typo's, grammatical errors, and bad writing style. This will make the reviewing burden even larger.”
Also related is the concern that publishing reviews would be uninformative, given that some reviews point out only minor flaws and that flaws are fixed in the final manuscript anyway. Interestingly, some respondents expressed fear of retribution, apparently even if there is the option of anonymous review.

Let’s turn now to those who are in favor of open reviews but want them to be anonymous. As expected, fear of retribution figures very prominently here.

"I fear retribution"
"I feel insecure"

Total written responses
Total responses

In fact, fear of retribution is by far the most common response in both groups. But there is an interesting difference between the groups. Whereas the untenured respondents are concerned about retribution against themselves, 6 of the tenured respondents express concerns for others, junior faculty and other potentially vulnerable groups. It is also interesting to ask whom respondents are fearing retribution from. Most fear the reviewed authors as the source of reprisal, but a few see the social media as the chief danger. Being on record as having endorsed publication of a controversial paper may make you the target of criticism:
"There could be mistakes that I am happy editors could spot and other reviewers could counterbalance, but I am not sure I would survive harsh open social media bashing for very long."
Some respondents provide more general observations about humanity to explain why they are hesitant to sign open reviews. This respondent who put it very succinctly:
“People, including professors, are dicks.”
I’m sure I speak for all of us when I respond to this sentiment with a heartfelt “amen!”

Finally, there are those who are in favor of signing their reviews and having them published. What are their motivations?

"getting credit"

Total written responses
Total responses

A plurality of untenured respondents mentioned transparency as the main reason for open, signed reviews. Related to this are the notions of accountability and fairness. It is only fair that if the authors are known that the reviewers should also be known appears to be the reasoning here. As one respondent put it:
“I feel like I have myself received a number of (anonymous) reviews that seemed done in a rush and sometimes unfair. It was in particular the way they were written that made for unpleasant reading. I think that having reviews published will improve the way they are written, because no one would want a badly written review out there.”
Another reviewer noted:
“ holds me, and other reviewers accountable (no mean-spirited bashing, or self-serving "cite ref-to-my-work-here" tactics without looking like a jerk) “
Separate from this is the notion of credit. Quiet a few respondents mention they like the idea of receiving credit for their work as reviewers:
“Reviews are a lot of work, and I am proud of the work that I do in them. I'd like to get some credit for that work.”
Several respondents were of the opinion that open review enhances the quality of the reviews:
“I think that publishing your review and name provides a social incentive to do a good job with the review.”
So what can we conclude from this? I think quite a bit. But rather than drawing conclusions myself at this point, I’m interested in hearing your views.


*This obviously is just a first pass through the data. I’ll need someone to provide an independent coding and I need to analyze more than just the first response people gave.