Too Good to be False: Nonsignificant Results Revisited

Chris H J Hartgerink, Jelte M. Wicherts, M. A. L. M. Van Assen

Research output: Contribution to journalArticleAcademicpeer-review


Due to its probabilistic nature, Null Hypothesis Significance Testing (NHST) is subject to decision errors. The concern for false positives has overshadowed the concern for false negatives in the recent debates in psychology. This might be unwarranted, since reported statistically nonsignificant findings may just be ‘too good to be false’. We examined evidence for false negatives in nonsignificant results in three different ways. We adapted the Fisher test to detect the presence of at least one false negative in a set of statistically nonsignificant results. Simulations show that the adapted Fisher method generally is a powerful method to detect false negatives. We examined evidence for false negatives in the psychology literature in three applications of the adapted Fisher method. These applications indicate that (i) the observed effect size distribution of nonsignificant effects exceeds the expected distribution assuming a null-effect, and approximately two out of three (66.7%) psychology articles reporting nonsignificant results contain evidence for at least one false negative, (ii) nonsignificant results on gender effects contain evidence of true nonzero effects, and (iii) the statistically nonsignificant replications from the Reproducibility Project Psychology (RPP) do not warrant strong conclusions about the absence or presence of true zero effects underlying these nonsignificant results. We conclude that false negatives deserve more attention in the current debate on statistical practices in psychology. Potentially neglecting effects due to a lack of statistical power can lead to a waste of research resources and stifle the scientific discovery process.

Original languageEnglish
Article number9
JournalCollabra: Psychology
Issue number1
Publication statusPublished - 2017


  • NHST
  • reproducibility project
  • nonsignificant
  • power
  • underpowered
  • effect size
  • Fisher test
  • gender


Dive into the research topics of 'Too Good to be False: Nonsignificant Results Revisited'. Together they form a unique fingerprint.

Cite this