Thursday, January 9, 2014

Donald Trump’s Hair and Implausible Patterns of Results

In the past few years, a set of new terms has become common parlance in post-publication discourse in psychology and other social sciences: sloppy science, questionable research practices, researcher degrees of freedom, fishing expeditions, and data that are too-good-to-be-true. An excellent new paper by Andrew Gelman and Eric Loken takes a critical look at this development. The authors point out that they regret having used the term fishing expedition in a previous article that contained critical analyses of published work.

The problem with such terminology, they assert, is that it implies conscious actions on the part of the researchers even though—as they are careful to point out--the people who have coined, or are using, those terms (this includes me) may not think in terms of conscious agency. The main point Gelman and Loken make in the article is that there are various ways in which researchers can unconsciously inflate effects. I will write more about this in a later post. I want to focus on the nomenclature issue here. Gelman and Loken are right that despite the post-publication reviewers’ best intentions, the terms they use do evoke conscious agency.

We need to distinguish between post-publication review and ethics investigations in this regard, as these activities have different goals. Scientific integrity committees are charged with investigating the potential wrongdoings of scientists; they need to reverse-engineer behavior from the information at their disposal (published data, raw data, interviews with the researcher, their collaborators, and so on). Post-publication review is not about research practices. It is about published results and the conclusions that can or cannot be drawn from them.

If we accept this division of labor, then we need to agree with Gelman and Loken that the current nomenclature is not well suited for post-publication review. Actions cannot be unambiguously reverse-engineered from the published data. Let me give a linguistic example to illustrate. Take the sentence Visiting relatives can be frustrating. Without further context, it is impossible to know which process has given rise to this utterance. The sentence is a standing ambiguity and any Chomskyan linguist will tell you that it has one surface structure (the actual sentence) and two deep structures (meanings). The sentence can mean that it is frustrating to visit relatives or that it is frustrating when they are visiting you. There is no way to tell which deep structure has given rise to this surface structure.

It is the same with published data. Are the results the outcome of a stroke of luck, optional stopping, selective removal of data, selective reporting, an honest error, or outright fraud? This is often difficult to tell and probably not something that ought to be discussed in post-publication discourse anyway.

So the problem is that the current nomenclature generally brings to mind agency. Take sloppy science. It implies that the researcher has failed to exert an appropriate amount of care and attention; science itself cannot be sloppy. As Gelman and Loken point out, p-hacking is not necessarily intended to mean that someone deliberately bent the rules (and, in fact, their article is about how researchers unwittingly inflate the effects they report; more about this interesting idea in a later post). However, the verb implies actions on the part of the researcher; it is not a description of the results of a study. The same is true, of course, of fishing expedition. It is the researchers who are going on a fishing expedition; it is not the data who have cast their lines. Questionable research practices is obviously a statement about the researcher, as is researcher degrees of freedom.

But how about too-good-to-be-true? Clearly this qualifies as a statement about the data and not about the researcher. Uri Simonsohn used it to describe the data of Dirk Smeesters and the Scientific Integrity Committee I chaired adopted this characterization as well. Still, it has a distinctly negative connotation. Frankly, the first thing I think of when I hear too-good-to-be-true is Donald Trumps hair. And let’s face it: no researcher on this planet wants to be associated—however remotely—with Donald Trump’s hair. 

What we need for post-publication review is a term that does not imply agency or refer to the researcher—we cannot reverse engineer behavior from the published data—and that does not have a negative connotation. A candidate is implausible pattern of results (IPR). Granted, researchers will not be overjoyed when someone calls their results implausible but the term does not imply any wrongdoing on their part and yet does express a concern about the data.

But who am I to propose a new nomenclature? If readers of this blog have better suggestions, I’d love to hear them.


  1. I don't think 'implausible' is a very good term, because it's about believability, and we are doing science, not religion. We are talking about statistics. Statistics is about only 1 thing: probabilities. So that is the only way anyone should talk about data. Data allows us to draw inferences about probabilities. Some data patterns are improbable. Others are very likely, but only when the null-hypothesis is true, and not when the alternative hypothesis is true. Some data patterns are extremely improbable to be observed by chance alone. As long as we talk about it in terms of probability, we are doing our jobs and there is no need to worry about this topic.

    1. I don't necessarily associate plausibility with religion, more with credibility in light of relevant experiences but I see your point.

  2. Ultimately, this is going to require someone to specify which terms apply to which implied conduct (incompetence or fraud) --- kind of like "tax avoidance" vs "tax evasion". The line between accidental and deliberate deception is just that, a line: It has zero width. Stapel and Smeesters are clearly on one side of that line; an undergraduate project that generates post hoc hypotheses and uses the same data to confirm them is very likely on the other. But it gets greyer and greyer as you turn the various dials. For example, as the researchers get more senior, you get closer to "either s/he knew this was wrong, or s/he should have known".

    However, once you have decided that terms A and B refer, respectively, to "probably accidental" and "probably deliberate" misconduct, you're back to square one. If we're never prepared to come out in public and say "I reckon this is fraudulent" - which, frankly, we're not - then it all comes out a wash in the end regardless of which words we choose. Cf. how every word in English for a place to evacuate human waste is a euphemism for a washing place.

    I'm currently working on an article where I can't demonstrate any fraudulent intent --- indeed, the data is undoubtedly authentic, it's just the conclusions that are absurd --- but, if there is no intent to deceive, the level of statistical incompetence (from two full professors) is quite astonishing. I suspect that outright fraud is rare, but that we're dealing with "bullshit" --- in Frankfurt's terms, that is, the person uttering it genuinely doesn't care if it's true or not, as opposed to lying where they know it's false --- more often than we may care to find out (or admit).

    For what it's worth, if I heard the phrase "implausible pattern of results", it would instinctively make me think that actual fraud was being alleged. But maybe that's some form of reverse expectation whereby I'm treating an attempt at cautious language as euphemistic understatement. Get some opinions from non-Brits too. :-)

    1. Nick:

      There's also something else going on, I don't know the right phrase for this one either, when a researcher makes a statistical error (for example, reporting and interpreting a p-value at face value, even when the chosen comparison was highly contingent on data) and then refuses to admit the mistake even when it was pointed out.

    2. The German language has a great word for this: "beratungsresistent" (something like 'immune to advice').

    3. Leave it to the Germans to make advice sound like a virus.

  3. What about unusual pattern of results (UPR)? This is more neutral because nearly all papers will contain some unusual patterns of results. Perhaps too neutral. That said - the inference of fraud or sloppiness requires multiple unusual patterns within and between papers (cf. Simonsohn, 2013).

    1. Unusual presupposes that we know what is usual. For example, the large effect sizes in studies with small samples and weak manipulations are anything but unusual in areas such as social priming.

  4. You are forgetting at least one (set of) agent(s) here, I think. You already identified the researcher and the data, but you did not mention the reviewer(s) and editor. It is precisely that perspective that I find appealing in the HIBAR label for post-publication reviewing. It could well be that your post-publication review deals with things that were actually discussed during the review process but that it is actually due to the (anonymous) reviewers or the editor that they are not included in the final manuscript.

    1. Very true. This is why I included editors and reviewers in my HIBAR posts.

  5. Very interesting paper by Gelman and Loken, although (unusually) I find myself disagreeing with a number of their points.

    1. I think they are viewing psychology through rose-tinted glasses. The survey of John et al estimated high prevalence rates for a range of QRPs (often near 100%), so much of this behaviour is conscious and bog standard.
    That's not to say that there isn't also a lot of unconscious bias going on, but to argue that "We have no reason to think that researchers regularly [fish]" belies the evidence - we have every reason to think this. It would be truer to say that it isn't socially acceptable to consciously admit that QRPs are standard operating procedure in psychology, and that groupthink consolidates the delusion that such practices are justified (and necessary to compete for glamour pubs and grants), provided nobody freely admits to engaging in them. The first rule of fight club is...

    2. One part of what they are describing in their paper is essentially a form of HARKing ( Many of the cases they raise seem to involve researchers having a general idea of the kind of analysis they want to do, with several plausible hypotheses in mind and several legal analytic options. Based upon either inferential analysis or visual inspection, the researchers find what looks like the strongest effect in the data (perhaps only running one inferential analysis but clearly viewing the data descriptively) and then assigning it the best-fitting hypothesis. Whether this is conscious or not, this is a questionable research practice if the researchers are going to use p values. What I find confusing is that later in the paper, Gelman and Loken explicitly advocate HARKing, which (for me at least) undermines their criticisms of the examples they cite. I find myself unsure about how their own research practices differ from those of the studies under scrutiny.

    3. I found their dismissal of pre-registration unconvincing. Like many critics, they seem to be viewing pre-registration as a case where *only* pre-registered analyses are reported. This is never the case and it is a straw man. Using a pre-registered methodology, it is entirely reasonable to report exploratory analyses in addition to the confirmatory analyses. Pre-registration simply makes that distinction clear. As a community we need to face the fact that p values mean little when applied in an exploratory context because we can never know what researcher dfs were exploited.

    4. Their positive message is again rose-tinted in my view. They say: "Our positive message is related to our strong feeling that scientists are interested in getting closer to the truth." This is part of why people do science, but I think this is not the primary motivation and never will be in a competitive system where the needs of science compete with the needs of individual scientists. The primary goal, I am realising, is for scientists to keep their jobs, meet their career requirements, advance their careers, support their staff, define an identity as a productive worker, and be seen to an authority by others. For many, only when those conditions are met is the quest for truth given a place at the table. The sooner we admit this and dispel the illusion of scientists as objective truth-seekers, the sooner we can re-design incentives to align the needs of the scientists with the goal of revealing truth.

    1. Chris:

      Thanks for the comments--I wish you'd've sent these to us directly, but I guess it's ok because I did encounter them here, after all. (Rolf told me he would be posting something about our article.) In brief response:

      1. I am not personally involved in experimental psychology so it's hard for me to know how much direct fishing the various researchers are doing. My guess is that when there is a published p-value, a researcher often does a few analyses directly but avoids many others by looking at the data (I think this is what you are labeling as "harking"). What happened was that I had recently had encounters with researchers who'd insisted that they had done no direct multiple comparisons at all. The point I wanted to make was that it would be possible, indeed easy, to get huge problems of multiple comparisons without needing to directly perform these different comparisons. Perhaps we can revise our statement "We have no reason to think that researchers regularly [fish]" to become "Our claims are valid even if researchers never [fish]" or something like that.

      2. Could you please clarify where is the place later in the paper where we explicitly advocate harking? I'm not quite sure what you're referring to here, and so it's hard for me to understand what you are saying.

      3. We did not mean to dismiss pre-registration! I'm a big fan of pre-registered replication, and I'm a big fan of the "50 shades of gray" paper by Nosek et al. where they demonstrate the benefits of this approach. So if we said something that appeared to dismiss pre-registration, please let us know so we can clarify.

      4. I do believe scientists are interested in getting closer to the ultimate truth, even when they cheat! For example, I can only assume that disgraced primatologist Mark Hauser believed (and, probably, still believes) that monkeys can do whatever it is that he claimed they could do--even if those damn videotapes didn't show it. My guess is that he has a somewhat inflexible view of reality, and when he sees some data that contradict his hypothesis, he'll reinterpret and even cheat but as part of a larger truth-seeking goal. I think that, to scientists like him, rules of evidence (statistical and otherwise) are mere technicalities, as if he's filling out some silly form and it's no big deal if he fudges on some of the answers to get to the end a little quicker. Similarly for the researchers whose work is subject to multiple comparisons. I'm guessing they truly believe that the time in the menstrual cycle has huge effects on political attitudes, etc., they think they've made big big discoveries, and if maybe they bent the rules here and there (for example, using flexible data-inclusion rules that allow their results to look a bit stronger), well, those decisions are just in service to the big picture of science. Don't get me wrong, _I_ don't feel that way--but I'm a statistician and have been trained to focus on variability and uncertainty. My impression is that lots of people, scientists included, don't like to think about variability and uncertainty; rather, they think of science as comprising eternal truths that are easily discoverable via small-N surveys on Mechanical Turk.

    2. Chris, your comments are about the main point of the Gelman/Loken article, which I was planning to discuss in my next post. No doubt the post will be influenced by the interesting and astute comments you and Andrew are making here.

    3. Hi Andrew,

      1. I agree with the thrust of your argument that researchers don't need to consciously p-hack in order to exploit researcher dfs and elevate the false discovery rate. It just seemed to me that your supposition that most researchers don't do this isn't supported by the evidence (where evidence is available). I do like your rephrasing because whether researchers consciously exploit researcher dfs seems irrelevant to your central point that such paths can be tracked seemingly reasonably when there is sufficient ambiguity among legal options and a plausible range of hypotheses.

      (btw there is another interesting point here that even data-dredging can happen "unconsciously" due to confirmation bias and other cognitive failings that encourage us to believe that whatever outcome "worked best" for getting publishing in Journal of Flashy Results is in fact the most scientifically accurate).

      2. There were two sections where you seemed to advocate HARKing (and to be clear, by HARKing I mean "hypothesizing after results are known" in order to "predict" the outcome that was obtained. HARKing is a form of hindsight bias, as introduced by Kerr 1998, linked to in my comment above. It also seems to be very similar to the kind of approach taken by the example cases you raise, where analytic choices made after data inspection can be made to fit a a range of prior expectations).

      The first instance where you seemed to advocate HARKing is in section 4.3, paragraph 2:

      "For most of our own research projects this strategy [pre-registration] hardly seems possible: in our many applied research projects, we have learned so much by looking at the data. **Our most important hypotheses could never have been formulated ahead of time.**"

      Now, naturally it is perfectly fine and consistent with the hypothetico-deductive (H-D) model of the scientific method to use exploratory analyses to generate a hypothesis for the next study, but unless I've misunderstood what you've written here, you seem to be advocating a position in which it is ok to decide/change your *a priori* hypotheses after inspecting data (i.e. HARKing). This reminds me of previous arguments in favour of HARKing, e.g. Bem (1987) wrote once that if data are “strong enough” then a researcher is justified in “subordinating or even ignoring [their] original hypotheses”. But doing so of course violates the H-D model.

      The second point which I interpreted as an argument for HARKing was in section 4.4, paragraph 2: "In our experience, it is good scienti c practice to refine one's research hypotheses in light of the data." Again, did you mean here refining it for the next study, which would be H-D consistent or for the *current* study (i.e. HARKing)?


    4. (cont)

      3. I got the strong impression that you were saying that pre-registration is useful but not in your field because, if I understand correctly, you rely on the kind of exploratory analyses that you believe pre-registration hinders. You go on to say that you don't want the "demands of statistical purity to strait-jacket our science."

      In response, I would argue that pre-registration has no bearing whatsoever on exploration in science and it hinders nothing. There is an argument to made that whenever a researcher (a) has an a priori hypothesis, and (b) wishes to employ NHST, then they should pre-register those hypotheses and the analysis plan so that readers know how to interpret the resulting p values. This, of course, leaves the option open for additional exploratory analyses and future hypothesis generation, but it allows readers to trust the outcome of NHST (insofar as that is possible) and to distinguish exploration from confirmation.

      4. I think you're argument here is very interesting, I admit I hadn't considered the possibility that some scientists could believe that cheating etc. is an exercise in approaching truth. That's a very interesting psychological question, actually.

      Hope this clarifies. I enjoyed your paper - very thought provoking.

    5. >I admit I hadn't considered the possibility
      >that some scientists could believe that cheating etc.
      >is an exercise in approaching truth. That's a very
      >interesting psychological question, actually.
      In Diederik Stapel's autobiography, he claims* his first data fraud was as a professor at Groningen, changing a single number in a dataset, to make the results fit the theory better. How far removed is that, objectively, from trimming a couple of outliers? Yet one is "obviously" fraud while the other happens "legitimately" all the time, in order to get "neat" (in all senses of that word) results.

      * Although there are suggestions that there may have been fraud as far back as the articles that went into his PhD thesis.

    6. Chris:

      Thanks for clarifying. In brief response:

      1 & 4: We now seem to be in agreement.

      2: I'm making two points, one descriptive and one normative. The descriptive point came when I wrote: "For most of our own research projects this strategy [pre-registration] hardly seems possible: in our many applied research projects, we have learned so much by looking at the data. Our most important hypotheses could never have been formulated ahead of time." This is an accurate description of much of my research--if you want you can take this as evidence of widespread harking, but in any case I think it's only fair of me to make the point in my article.

      But, as you point out, I also make a normative statement, when I wrote: "In our experience, it is good scientific practice to refine one's research hypotheses in light of the data." In light of your comments, let me revise this statement by changing it to "it _can be_ good scientific practice…".

      I can tell you about two examples of my own research where harking worked very well. These are my three most political science papers:

      [1993] Why are American Presidential election campaign polls so variable when votes are so predictable? {\em British Journal of Political Science} {\bf 23}, 409--451. (Andrew Gelman and Gary King)

      [1994] Enhancing democracy through legislative redistricting. {\em American Political Science Review} {\bf 88}, 541--559. (Andrew Gelman and Gary King)

      [2007] Rich state, poor state, red state, blue state: What's the matter with Connecticut? {\em Quarterly Journal of Political Science} {\bf 2}, 345--367. (Andrew Gelman, Boris Shor, Joseph Bafumi, and David Park)

      In each of these highly successful papers (and I mean "successful" in being influential and in uncovering real patterns), the analysis was done only after seeing the data. Indeed, these projects would have been far less important had the analysis been anticipated ahead of time. One aspect of the most exciting research is that it goes beyond what was anticipated. So, while I can respect that you don't like this sort of research, and I certainly agree that harking can lead to problems, in other cases harking is useful, indeed essential to making scientific progress.

      3. Preregistered replication makes a lot of sense in experimental psychology where it's typically easy to just go and gather more data (as Nosek et al. did in their wonderful "50 shades of gray" paper). It can be more difficult in observational fields such as (most of) political science and economics.

    7. "The sooner we admit this and dispel the illusion of scientists as objective truth-seekers, the sooner we can re-design incentives to align the needs of the scientists with the goal of revealing truth."

      Chris -- I'm not sure whether to take your point #4 as cynical (scientists' motives are nefarious), realistic (scientists' motives are practical), naïve (comfortably successful scientists are more interested in truth than less successful scientists, who in fact might be less successful because they're honest), or optimistic (incentive structures can change to favor the pure pursuit of truth).

      Overall, I'm feeling very pessimistic about the current state of the field and its prospects for the future... Other than that, you made some good points!!

    8. Hi Andrew,
      So to make sure we are on the same page, when you say "in other cases harking is useful, indeed essential to making scientific progress", are you saying that it can be acceptable (even essential) for researchers to change or invent their hypothesis after inspecting the data and then report that hypothesis as though it was a priori? That is what I mean by HARKing (as defined by Kerr 1998).

      If this isn't what you mean, then I think we are talking about different behaviours.

      But if this is what you mean, don't you think HARKing misleads by conflating hypothesis generation with hypothesis testing? Also, don't you think that it distorts the scientific record by producing hypotheses that, by definition, cannot be falsified by the data from which the hypothesis is generated?

    9. Hi Neurocritic - probably all of the above! I am sure I guilty of a certain level of naivety (and comfortable with it). I'm not so pessimistic though. Lots of positive changes are being made or are in the wind, and we're not just standing around waiting for reform; many of us are at the coal face. There are lots of positive steps we can all make to improve practices, but I won't start preaching. There are many career obstacles to reform, particularly for more early-career researchers.

    10. Chris:

      You write: "are you saying that it can be acceptable (even essential) for researchers to change or invent their hypothesis after inspecting the data and then report that hypothesis as though it was a priori?"

      No. I'm saying that it can be acceptable (even essential) for researchers to change or invent their hypothesis after inspecting the data. But, no, that hypothesis should not be reported as though it was a priori. It should be clearly stated that the hypothesis came from the data. That's what I was talking about when I wrote, "it is good scientific practice to refine one's research hypotheses in light of the data."

    11. Hi Andrew - thanks for clarifying. I'm glad because now I think we are in full agreement. As long as that happens transparently, what you're describing isn't HARKing as defined by Kerr -- the key defining feature of HARKing is that the post-hoc hypothesis is presented as a priori. This is what I thought you were advocating in your article.

      Rather than HARKing, the process you describe seems to be one of hypothesis-generation through exploration, which can then lead to a priori hypothesis-testing in future studies (it is also consistent with the H-D scientific method).

      Thanks for the discussion (and thanks Rolf for allowing me to..err... hijack this thread). In answer to the central question of Rolf's post, I think we should continue with the label "questionable research practices" because whether or not the practices are conscious or not, they are still questionable because the outcome is harmful to science. And what is particularly questionable is that researchers allow themselves to be thwarted by unconscious bias when there are known solutions to the problem (such as pre-registration) that can prevent exploitation of researcher dfs, regardless of our intentions.

    12. Chris:

      OK, so in that language, I'd say that lots or researchers are harking without realizing it. They think they've prespecified their research hypotheses, but actually their hypotheses are quite vague, and the specific hypotheses in their published papers are harked.

  6. I'd say "results that are open to doubt".

    1. That sounds quite neutral but is quite subjective at the same time. It places the qualification in the eye of the beholder rather than in terms of probability or plausibility.

  7. It is of course very noble and politically correct to want to avoid explicitly or implicitly accusing scientists of things like organizing fishing expeditions. The problem is, as others have pointing out, including in this blog post, that we also know that fishy procedures are actually used a lot in actual research practice.

    Two points: a) not knowing that one should not use p-values if one has generated the hypothesis from the data is of course never a valid excuse for doing so. Ignorance does not protect from the law, so to speak. b) I think esp. editors and reviewers should keep in mind that questionable practices in data analysis are widespread, and therefore they should explicitly address these issues during the review process. For instance, if the hypothesis looks like it may have come from the data, one could ask to address the question how the authors have conceived of that particular hypothesis (i.e., where it comes from). If that turns out not to be from a certain known theory, the editor could ask the author to make explicit the (presumed) fact that this hypothesis was, in fact, *not* generated after the data, but based on spontaneous intuition or something like that. This making explicit of these issues facilitates the awareness in all involved that these particular issues are important, and that it is the authors responsibility to address them openly. So there is no accusation here, only an urge to be maximally transparent.

    Finally: the problem reminds me of the problem of sick leave in large organizations. If 90% of all employees of a department call in sick for the lmaximum time one can do so legally without negative consequences for their salaries (stuff like that actually happens) one suspects strongly that some of those employees are cheating. The problem is that one does not know who, and does not want to falsely accuse the individual employees that are sick. The only option then probably is to send a doctor to their house when they are often sick.

    1. Your sick leave analogy strikes me as very apt. I can think of some people that require a visit by the doctor.

  8. I agree with Rolf that post publication review critiques should focus on the properties of the data and only tangentially (if at all) on the behaviour of the researchers. In my own critiques I have always tried to emphasize how easy it is to introduce bias into a set of experiments. I always saw this as a way to mitigate possible negative criticism of the original authors, but some readers ignore my clarifying statements and interpret my critiques as "attacks" on the authors.

    In some sense, I understand such an interpretation from readers. If a set of experimental results appear "too good to be true" (or whatever term you prefer), there are no nice ways to point out this flaw. (See ). When such experimental results are published, there are two broad interpretations: malfeasance or ignorance. The former suggests fraud by the authors. The latter suggests misunderstandings about scientific practice for the authors, reviewers, and the editor. An interpretation of ignorance is arguably worse for the field than an interpretation of malfeasance, because ignorance implies problems with scientific training and misunderstanding among large groups of people. If the problems were just due to fraud, we could punish the evil doers and return to standard practice.

    Regarding a term, I would be happy with "implausible pattern of results" (IPR), but I disagree with Rolf that we need a term without a negative connotation. Even constructive criticism has a negative connotation (at least for the person receiving the criticism), and I think we cannot escape that aspect of post publication review. "Too good to be true" focuses on the data, so I think it is appropriate. Schimmack (2012) suggested the term "incredible", which sounds positive at first but really means incredulous in this context.

    1. Greg, you're right that we cannot avoid terms with negative connotations. (I fear my Donald Trump joke got the better of me.) We both agree that the statement should be about the results and not the researcher.

  9. Mr. Zwaan, seems to me like attempts to whitewash bad practices or tip toe around them in order to preserve the feeling of integrity among the academic community (instead of earning it) is precisely what needs to be left behind., Basic politeness should suffice, no need to whine about the specific words used. Psychologists should learn to give and accept criticism without taking it personally, in other words, they should grow up and learn from their mistakes be they intentional or not.

    I'm not saying it's easy given that in academia the need for status and ego gratification as well as the peer pressure pushes toward half-baked reasoning, conceptual obfuscation and butt-hurt, quantitative methods and peer review notwithstanding. As always, the good example is overshadowed by mediocrity unless it's merit is recognized and there's no system that can assure this, just a matter of individual human perceptivity and integrity (a matter for psychologists to study!)

    Also, it would help that psychologists were less try-hard in promoting their "science".

  10. Mr. Zwaan, seems to me like attempts to whitewash bad practices or tip toe around them in order to preserve the feeling of integrity among the academic community (instead of earning it) is precisely what has been wrong with psychology and what needs to be left behind. In this sense, basic politeness should suffice, no need to whine about the specific words used. Psychologists should learn to give and accept criticism, in other words, they should grow up and learn from their mistakes, be they intentional or not.

    It would also help if they were less try-hard in promoting their "science".