Manipulating the alpha level cannot cure significance testing

David Trafimow, Valentin Amrhein, Corson N. Areshenkoff, Carlos J. Barrera-Causil, Eric J. Beh, Yusuf K. Bilgiç, Roser Bono, Michael T. Bradley, William M. Briggs, Héctor A. Cepeda-Freyre, Sergio E. Chaigneau, Daniel R. Ciocca, Juan C. Correa, Denis Cousineau, Michiel R. de Boer, Subhra S. Dhar, Igor Dolgov, Juana Gómez-Benito, Marian Grendar, James W. Grice & 30 others Martin E. Guerrero-Gimenez, Andrés Gutiérrez, Tania B. Huedo-Medina, Klaus Jaffe, Armina Janyan, Ali Karimnezhad, Fränzi Korner-Nievergelt, Koji Kosugi, Martin Lachmair, Rubén D. Ledesma, Roberto Limongi, Marco T. Liuzza, Rosaria Lombardo, Michael J. Marks, Gunther Meinlschmidt, Ladislas Nalborczyk, Hung T. Nguyen, Raydonal Ospina, Jose D. Perezgonzalez, Roland Pfister, Juan J. Rahona, David A. Rodríguez-Medina, Xavier Romão, Susana Ruiz-Fernández, Isabel Suarez, Marion Tegethoff, Mauricio Tejo, Rens van de Schoot, Ivan I. Vankov, Santiago Velasco-Forero

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.

Original languageEnglish
Article number699
JournalFrontiers in Psychology
Volume9
Issue numberMAY
DOIs
Publication statusPublished - 15 May 2018

Fingerprint

Magic
Sample Size
Research Design

Keywords

  • Decision making
  • Null hypothesis testing
  • P-value
  • Significance testing
  • Statistical significance

Cite this

Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C. J., Beh, E. J., Bilgiç, Y. K., ... Velasco-Forero, S. (2018). Manipulating the alpha level cannot cure significance testing. Frontiers in Psychology, 9(MAY), [699]. https://doi.org/10.3389/fpsyg.2018.00699
Trafimow, David ; Amrhein, Valentin ; Areshenkoff, Corson N. ; Barrera-Causil, Carlos J. ; Beh, Eric J. ; Bilgiç, Yusuf K. ; Bono, Roser ; Bradley, Michael T. ; Briggs, William M. ; Cepeda-Freyre, Héctor A. ; Chaigneau, Sergio E. ; Ciocca, Daniel R. ; Correa, Juan C. ; Cousineau, Denis ; de Boer, Michiel R. ; Dhar, Subhra S. ; Dolgov, Igor ; Gómez-Benito, Juana ; Grendar, Marian ; Grice, James W. ; Guerrero-Gimenez, Martin E. ; Gutiérrez, Andrés ; Huedo-Medina, Tania B. ; Jaffe, Klaus ; Janyan, Armina ; Karimnezhad, Ali ; Korner-Nievergelt, Fränzi ; Kosugi, Koji ; Lachmair, Martin ; Ledesma, Rubén D. ; Limongi, Roberto ; Liuzza, Marco T. ; Lombardo, Rosaria ; Marks, Michael J. ; Meinlschmidt, Gunther ; Nalborczyk, Ladislas ; Nguyen, Hung T. ; Ospina, Raydonal ; Perezgonzalez, Jose D. ; Pfister, Roland ; Rahona, Juan J. ; Rodríguez-Medina, David A. ; Romão, Xavier ; Ruiz-Fernández, Susana ; Suarez, Isabel ; Tegethoff, Marion ; Tejo, Mauricio ; van de Schoot, Rens ; Vankov, Ivan I. ; Velasco-Forero, Santiago. / Manipulating the alpha level cannot cure significance testing. In: Frontiers in Psychology. 2018 ; Vol. 9, No. MAY.
@article{82339b6993ad4f26b577c459c3609611,
title = "Manipulating the alpha level cannot cure significance testing",
abstract = "We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.",
keywords = "Decision making, Null hypothesis testing, P-value, Significance testing, Statistical significance",
author = "David Trafimow and Valentin Amrhein and Areshenkoff, {Corson N.} and Barrera-Causil, {Carlos J.} and Beh, {Eric J.} and Bilgi{\cc}, {Yusuf K.} and Roser Bono and Bradley, {Michael T.} and Briggs, {William M.} and Cepeda-Freyre, {H{\'e}ctor A.} and Chaigneau, {Sergio E.} and Ciocca, {Daniel R.} and Correa, {Juan C.} and Denis Cousineau and {de Boer}, {Michiel R.} and Dhar, {Subhra S.} and Igor Dolgov and Juana G{\'o}mez-Benito and Marian Grendar and Grice, {James W.} and Guerrero-Gimenez, {Martin E.} and Andr{\'e}s Guti{\'e}rrez and Huedo-Medina, {Tania B.} and Klaus Jaffe and Armina Janyan and Ali Karimnezhad and Fr{\"a}nzi Korner-Nievergelt and Koji Kosugi and Martin Lachmair and Ledesma, {Rub{\'e}n D.} and Roberto Limongi and Liuzza, {Marco T.} and Rosaria Lombardo and Marks, {Michael J.} and Gunther Meinlschmidt and Ladislas Nalborczyk and Nguyen, {Hung T.} and Raydonal Ospina and Perezgonzalez, {Jose D.} and Roland Pfister and Rahona, {Juan J.} and Rodr{\'i}guez-Medina, {David A.} and Xavier Rom{\~a}o and Susana Ruiz-Fern{\'a}ndez and Isabel Suarez and Marion Tegethoff and Mauricio Tejo and {van de Schoot}, Rens and Vankov, {Ivan I.} and Santiago Velasco-Forero",
year = "2018",
month = "5",
day = "15",
doi = "10.3389/fpsyg.2018.00699",
language = "English",
volume = "9",
journal = "Frontiers in Psychology",
issn = "1664-1078",
publisher = "Frontiers Media",
number = "MAY",

}

Trafimow, D, Amrhein, V, Areshenkoff, CN, Barrera-Causil, CJ, Beh, EJ, Bilgiç, YK, Bono, R, Bradley, MT, Briggs, WM, Cepeda-Freyre, HA, Chaigneau, SE, Ciocca, DR, Correa, JC, Cousineau, D, de Boer, MR, Dhar, SS, Dolgov, I, Gómez-Benito, J, Grendar, M, Grice, JW, Guerrero-Gimenez, ME, Gutiérrez, A, Huedo-Medina, TB, Jaffe, K, Janyan, A, Karimnezhad, A, Korner-Nievergelt, F, Kosugi, K, Lachmair, M, Ledesma, RD, Limongi, R, Liuzza, MT, Lombardo, R, Marks, MJ, Meinlschmidt, G, Nalborczyk, L, Nguyen, HT, Ospina, R, Perezgonzalez, JD, Pfister, R, Rahona, JJ, Rodríguez-Medina, DA, Romão, X, Ruiz-Fernández, S, Suarez, I, Tegethoff, M, Tejo, M, van de Schoot, R, Vankov, II & Velasco-Forero, S 2018, 'Manipulating the alpha level cannot cure significance testing' Frontiers in Psychology, vol. 9, no. MAY, 699. https://doi.org/10.3389/fpsyg.2018.00699

Manipulating the alpha level cannot cure significance testing. / Trafimow, David; Amrhein, Valentin; Areshenkoff, Corson N.; Barrera-Causil, Carlos J.; Beh, Eric J.; Bilgiç, Yusuf K.; Bono, Roser; Bradley, Michael T.; Briggs, William M.; Cepeda-Freyre, Héctor A.; Chaigneau, Sergio E.; Ciocca, Daniel R.; Correa, Juan C.; Cousineau, Denis; de Boer, Michiel R.; Dhar, Subhra S.; Dolgov, Igor; Gómez-Benito, Juana; Grendar, Marian; Grice, James W.; Guerrero-Gimenez, Martin E.; Gutiérrez, Andrés; Huedo-Medina, Tania B.; Jaffe, Klaus; Janyan, Armina; Karimnezhad, Ali; Korner-Nievergelt, Fränzi; Kosugi, Koji; Lachmair, Martin; Ledesma, Rubén D.; Limongi, Roberto; Liuzza, Marco T.; Lombardo, Rosaria; Marks, Michael J.; Meinlschmidt, Gunther; Nalborczyk, Ladislas; Nguyen, Hung T.; Ospina, Raydonal; Perezgonzalez, Jose D.; Pfister, Roland; Rahona, Juan J.; Rodríguez-Medina, David A.; Romão, Xavier; Ruiz-Fernández, Susana; Suarez, Isabel; Tegethoff, Marion; Tejo, Mauricio; van de Schoot, Rens; Vankov, Ivan I.; Velasco-Forero, Santiago.

In: Frontiers in Psychology, Vol. 9, No. MAY, 699, 15.05.2018.

Research output: Contribution to JournalArticleAcademicpeer-review

TY - JOUR

T1 - Manipulating the alpha level cannot cure significance testing

AU - Trafimow, David

AU - Amrhein, Valentin

AU - Areshenkoff, Corson N.

AU - Barrera-Causil, Carlos J.

AU - Beh, Eric J.

AU - Bilgiç, Yusuf K.

AU - Bono, Roser

AU - Bradley, Michael T.

AU - Briggs, William M.

AU - Cepeda-Freyre, Héctor A.

AU - Chaigneau, Sergio E.

AU - Ciocca, Daniel R.

AU - Correa, Juan C.

AU - Cousineau, Denis

AU - de Boer, Michiel R.

AU - Dhar, Subhra S.

AU - Dolgov, Igor

AU - Gómez-Benito, Juana

AU - Grendar, Marian

AU - Grice, James W.

AU - Guerrero-Gimenez, Martin E.

AU - Gutiérrez, Andrés

AU - Huedo-Medina, Tania B.

AU - Jaffe, Klaus

AU - Janyan, Armina

AU - Karimnezhad, Ali

AU - Korner-Nievergelt, Fränzi

AU - Kosugi, Koji

AU - Lachmair, Martin

AU - Ledesma, Rubén D.

AU - Limongi, Roberto

AU - Liuzza, Marco T.

AU - Lombardo, Rosaria

AU - Marks, Michael J.

AU - Meinlschmidt, Gunther

AU - Nalborczyk, Ladislas

AU - Nguyen, Hung T.

AU - Ospina, Raydonal

AU - Perezgonzalez, Jose D.

AU - Pfister, Roland

AU - Rahona, Juan J.

AU - Rodríguez-Medina, David A.

AU - Romão, Xavier

AU - Ruiz-Fernández, Susana

AU - Suarez, Isabel

AU - Tegethoff, Marion

AU - Tejo, Mauricio

AU - van de Schoot, Rens

AU - Vankov, Ivan I.

AU - Velasco-Forero, Santiago

PY - 2018/5/15

Y1 - 2018/5/15

N2 - We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.

AB - We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.

KW - Decision making

KW - Null hypothesis testing

KW - P-value

KW - Significance testing

KW - Statistical significance

UR - http://www.scopus.com/inward/record.url?scp=85047014418&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047014418&partnerID=8YFLogxK

U2 - 10.3389/fpsyg.2018.00699

DO - 10.3389/fpsyg.2018.00699

M3 - Article

VL - 9

JO - Frontiers in Psychology

JF - Frontiers in Psychology

SN - 1664-1078

IS - MAY

M1 - 699

ER -

Trafimow D, Amrhein V, Areshenkoff CN, Barrera-Causil CJ, Beh EJ, Bilgiç YK et al. Manipulating the alpha level cannot cure significance testing. Frontiers in Psychology. 2018 May 15;9(MAY). 699. https://doi.org/10.3389/fpsyg.2018.00699