These are troubled times in our little frontier town called
Psytown. The priest keeps telling us that deep down we’re all p-hackers and
that we must atone for our sins.
If you go out on the streets, you face arrest by any number
of unregulated police forces and vigilantes.
If you venture out with a p-value of .065, you should count
yourself lucky if you run into deputy Matt Analysis. He’s a kind man and will
let you off with a warning if you promise to run a few more studies, conduct a
meta-analysis, and remember never to use the phrase “approaching significance”
ever again.
It could be worse.
You could be pulled over by a Bayes Trooper. “Please step
out of the vehicle, sir.” You comply. “But I haven’t done anything wrong,
officer, my p equals .04.” He lets out a derisive snort “You reckon that’s doin’ nothin’ wrong? Well, let me tell
you somethin’, son. Around these parts we don’t care about p. We care about Bayes factors. And yours is way below the legal
limit. Your evidence is only anecdotal, so I’m gonna have to book you.”
Or you could run into the Replication Watch. “Can we see
your self-replication?” “Sorry, I don’t have one on me but I do have a p<.01.”
“That’s nice but without a self-replication we cannot allow you on the
streets.” “But I have to go to work.” “Sorry, can’t do, buddy.” “Just sit tight
while we try to replicate you.”
Or you could be at a party when suddenly two sinister people
in black show up and grab you by the arms. Agents from the Federal Bureau of
Pre-registration. “Sir, you need to come with us. We have no information in our
system that you’ve pre-registered with us.” “But I have p<.01 and I replicated
it” you exclaim while they put you in a black van and drive off.
Is it any wonder that the citizens of Psytown stay in most
of the day, fretting about their evil tendency to p-hack, obsessively stepping
on the scale worried about excess significance, and standing in front of the
mirror checking their p-curves?
And then when they are finally about to fall asleep, there
is a loud noise. The village idiot has gotten his hands on the bullhorn again.
“SHAMELESS LITTLE BULLIES” he shouts into the night. “SHAMELESS LITTLE
BULLIES.”
Something needs to change in Psytown. The people need to know
what’s right and what’s wrong. Maybe they need to get together to devise a
system of rules. Or maybe a new sheriff needs to ride into town and lay down
the law.
Or maybe we need to make a stronger distinction between the scientific investigation of truth and the forensic determination of fraud - which all these metaphors are doing their best to blur.
BeantwoordenVerwijderenI only had the former in mind when writing this post. None of these methods are suitable for the determination of fraud, in my view.
VerwijderenFantastic ;-) I am going to read this parabel to my children (when they are older, maybe in their twenties ...).
BeantwoordenVerwijderenThanks! I'm thinking of turning it into a TV series.;)
VerwijderenNice post, which sums up what seems (to me as an outsider, anyway) to be one of the principal problems in social science generally, namely that there is very little agreement on what any of the statistical systems actually *mean* --- as illustrated by the number of law enforcement organisations, each enforcing their own laws, some of them mutually incompatible. As a result, everyone has their own statistical system, with their own preferred interpretations. Not only is that inherently bad science, but it also creates lots of convenient cracks in which to hide QRPs.
BeantwoordenVerwijderenAndrew Gelman had a blog post about a month ago on the astonishingly basic question of if (and if so, when) it's justifiable to use one-tailed tests. It turns out there is surprisingly little consensus. So some authors will continue to double-dip on p<.05 by doing one-tailed comparisons "because I stated a directional hypothesis", and dare the reviewers to call them out on it.
To still be arguing over basic questions like this, 70 or more years after Fisher, Neyman, and Pearson, is ridiculous. Of course, stats need interpretation, but without some kind of standards (which, I suggest, can only be imposed by the journals), it's going to continue to be possible, indeed almost mandatory, for (A) and (not A) to be true --- not undetermined, but actually true --- simultaneously. How about the next edition of the APA Publication Manual taking a position on some of these questions, instead of finding more obsessive rules for how to punctuate references?
A counterpoint against "more rules" and "more standards": Truth and objectivity is only approached by invariance. (note: "approached". Probably never reached, and if reached, we never can known whether we have reached it).
VerwijderenWhat does invariance mean? Gerhard Vollmer wrote an illuminating paper about it, here's the key sentence:
"A proposition about the world is objective if and only if its meaning and its truth is invariant against a change in the conditions under which it was formulated, that is, if it is independent of its author, observer, reference system, test method, and conventions."
That means, when different researchers, using different tests, and different conventions come to the same conclusion, then it has good chances to be closer to the truth than otherwise.
So, if Bayes factors, p values, likelihoods, and posteriors all agree about the presence or the absence of an effect (and other conditions, such as validity, hold), *then* we can make a claim about the world.
If they disagree, we have to make a step backwards and start thinking again - why do they differ?
Some more excerpts (I really like the paper):
"But when is a description of nature objective? Evidently it is desirable to have a criterion of objectivity. How about intersubjectivity? Very often people are satisfied with such a criterion or even define objectivity as intersubjectivity. [...]
But this is not enough: When all men were convinced that the earth was a disk, this conviction was completely intersubjective, but it was wrong and it was by no means objective. [...]
Hence, intersubjectivity is not enough. It is necessary but not sufficient. [...]
There is, indeed, another common property. It is the independence of the structure in question of certain changes, its stability against pertinent alterations, its invariance under some specified transformations. Thus, we say: A proposition is objective if and only if its meaning and its truth is invariant against a change in the conditions under which it was formulated, that is, if it is independent of its author, observer, reference system, test method, and conventions."
Vollmer, G. (2010). Invariance and Objectivity. Foundations of Physics, 40(9-10), 1651–1667. doi:10.1007/s10701-010-9471-x
Good points! I'm reminded of an experience early in my career. One of my co-authors wanted to include one-tailed tests but the editor told us "to get rid of that one-tailed nonsense." A few years later, I encountered one-tailed tests in the editor's own papers.
VerwijderenHa, great allegory! Reminded me of this bit which fits well with your portrayal of psy-town.
BeantwoordenVerwijderen"O Zarathustra, here is the great city: here have you nothing to seek and
everything to lose."
Thus spoke Zarathustra, p. 140
I had Clint Eastwood in mind but nice that it makes you think of Nietzsche.;)
VerwijderenThere are whispers about about an ancient fella called Omniscient Jones, who is a master in the lost way of the Theory. He is wanted by the Correlational Intelligence Agency because his apprentices disappear without any trace of scientific output only to emerge at least five years later, arguments blazing, shooting holes in the very fabric of reality.
BeantwoordenVerwijderenI think it's a bit too early to set up rules. I for one thing still haven't seen an Anova where the effect size (eg eta squared) is not only reported but also interpreted. So I'm not really sure what the effect-size troops on the ground are up to...
BeantwoordenVerwijderenAlso I think, many researcher's hold the misguided idea that once you get tenure you will use the same methodology until you retire. Then they are surprised when a new methodology is asked of them. Methodology and statistics like other branches of science evolve and researchers should keep an eye on the new developments.
Right now the problem is that nobody seems agree on what to do and so whatever you do, there will always be people to jump on you. This is making people skittish.
VerwijderenThe problem with statistics--old approaches and new--is our tendency to regard them as probative in themselves. At best, the new bureaus and agencies of your fable promise, with their updated decision-making criteria, to free us (as "p=.05" once did) from all the bother and trouble of exercising our scientific judgement. At worst, they deny us the opportunity to do the same. (Apologies to Robert Abelson.)
BeantwoordenVerwijderenWell put.
VerwijderenPerhaps one of the problems in Psytown is that there is black-market demand for novel results that are "blessed" with some type of statistical significance. With the high demand (and reward) that come making this product, some of the folks in Psytown are not interested in implementing any quality-control procedures or in changing their products. The product still sells.
BeantwoordenVerwijderenTrue. Don't get me started on the black market...
VerwijderenThat was a fun read, but I think a more appropriate analogy is that scientists are like people trying to build a house. It's a complicated process and they may not always know what they are doing. Various inspectors find faults in issues that may seem superfluous to the homebuilder, but it is usually in everyone's best interest to comply with the regulations (even if you don't care about the electrical wiring being up to code, your neighbors care and you might sell the house to someone else).
BeantwoordenVerwijderenWhere this analogy breaks down is that I am not sure the regulations being applied make sense, and this agrees with your final paragraph. Although they are relevant, I don't see any of the proposed methods (meta-analysis, Bayes factors, replication, or pre-registration) as really solving the fundamental issues. I am also not sure what the fundamental issues are, but I think it involves theory development from statistical data.
What this implies to me is that scientists need to be careful about their claims. I think we should stop the press releases and gushing enthusiasm about new (and old) findings until we more fully understand how to generate and interpret our data.
I completely agree with your last paragraph in particular. I also think the homebuilder metaphor is apt; I've used in previous posts. It takes multiple metaphors to describe the target domain. In this post I was trying to convey "the angst of the experimenter on the ground," which sounds awfully pretentious of course.;)
Verwijderen