What is a p-value?

When reviewing a scientific paper, or attending a research presentation at a meeting, we often see p-values reported as outcomes of a research project. Although commonly used in life sciences, p-values are often misunderstood, and in some cases, can be misleading. Understanding what a p-value describes can help the veterinary practitioner interpret and critically evaluate research results, and lower the risk of misinterpretations and misunderstandings.  

A p-value can be defined as a conditional probability of observing as great or greater difference in an outcome given that the null hypothesis is true. The null hypothesis is often formulated to state that there is no difference in risk, no effect of treatment, etc. For example, in a study of average daily gain (ADG) in feedlot steers that compares a new ration to an existing ration, the p-value derived from a comparison of the means of ADG would tell us the probability of observing as great or greater difference in the mean ADG between the two groups receiving different rations, if the ration has no impact on ADG. By convention, we often use an alpha (α) value of 0.05 as the cutoff for what we consider constitutes statistical difference in many life science disciplines. The “significance value”, as the alpha may be referred to, does not have to be 0.05, however, and may be raised or lowered in difference circumstances or disciplines. The alpha states the percent risk we are willing to accept that we will determine a statistical difference exists, when there actually is not a difference. This is called a Type I error or inappropriately rejecting the null hypothesis when the null is true. Conversely, we may also fail to reject the null hypothesis when the null is not true. This is called a Type II error, and the probability of committing a Type II error is denoted by beta (β). Therefore, 1-β, or the probability of detecting a difference (rejecting the null hypothesis) when there really is a difference is known as the “power of a test”. 

P-values are useful, but by themselves, do not tell the whole story. Often the magnitude of the difference observed in the outcome is important regardless of the p-value. For example, a study may find that treatment 1 produces a 0.01% decrease in morbidity compared to treatment 2 with a p-value of <0.0001. Although this would be considered a “statistically significant difference”, the difference may not be biologically relevant or may be too small to justify a change in management or treatment. Measures of association (i.e., risk ratio and odds ratio) are also important inferential statistics because they inform us on the strength and direction of the relationship between a potential risk factor (e.g., body condition score less than 5) and the outcome of interest (e.g., bovine pregnancy). Although a large p-value may indicate that the difference occurred due to chance, if the measure of association is strong, further exploration of the relationship may be warranted. Other useful statistical outputs include confidence intervals and parameter estimations; these describe the results of the study, and provide information not contained within the p-value.