Understanding Type I and Type II Errors
I recently got an inquiry that asked me to clarify the difference between type I and type II errors when doing statistical testing. Let me use this blog to clarify the difference as well as discuss the potential cost ramifications of type I and type II errors. I have also provided some examples at the end of the blog[1].
In statistical test theory, the notion of statistical error is an integral part of hypothesis testing. The statistical test requires an unambiguous statement of a null hypothesis (H_{0}), for example, “this person is healthy”, “this accused person is not guilty” or “this product is not broken”. The result of the test of the null hypothesis may be positive (healthy, not guilty, not broken) or may be negative (not healthy, guilty, broken).
If the result of the test corresponds with reality, then a correct decision has been made (e.g., person is healthy and is tested as healthy, or the person is not healthy and is tested as not healthy). However, if the result of the test does not correspond with reality, then two types of error are distinguished: type I error and type II error.
Type I Error (False Positive Error)
A type I error occurs when the null hypothesis is true, but is rejected. Let me say this again, a type I error occurs when the null hypothesis is actually true, but was rejected as false by the testing.
A type I error, or false positive, is asserting something as true when it is actually false. This false positive error is basically a “false alarm” – a result that indicates a given condition has been fulfilled when it actually has not been fulfilled (i.e., erroneously a positive result has been assumed).
Let’s use a shepherd and wolf example. Let’s say that our null hypothesis is that there is “no wolf present.” A type I error (or false positive) would be “crying wolf” when there is no wolf present. That is, the actual condition was that there was no wolf present; however, the shepherd wrongly indicated there was a wolf present by calling “Wolf! Wolf!” This is a type I error or false positive error.
Type II Error (False Negative)
A type II error occurs when the null hypothesis is false, but erroneously fails to be rejected. Let me say this again, a type II error occurs when the null hypothesis is actually false, but was accepted as true by the testing.
A type II error, or false negative, is where a test result indicates that a condition failed, while it actually was successful. A Type II error is committed when we fail to believe a true condition.
Continuing our shepherd and wolf example. Again, our null hypothesis is that there is “no wolf present.” A type II error (or false negative) would be doing nothing (not “crying wolf”) when there is actually a wolf present. That is, the actual situation was that there was a wolf present; however, the shepherd wrongly indicated there was no wolf present and continued to play Candy Crush on his iPhone. This is a type II error or false negative error.
A tabular relationship between truthfulness/falseness of the null hypothesis and outcomes of the test can be seen in the table below:
Null Hypothesis is true | Null hypothesis is false | |
Reject null hypothesis | Type I ErrorFalse Positive | Correct OutcomeTrue Positive |
Fail to reject null hypothesis | Correct outcomeTrue Negative | Type II ErrorFalse Negative |
Examples
Let’s walk through a few examples and use a simple form to help us to understand the potential cost ramifications of type I and type II errors. Let’s start with our shepherd/wolf example.
Null Hypothesis | Type I Error / False Positive | Type II Error / False Negative |
Wolf is not present | Shepherd thinks wolf is present (shepherd cries wolf) when no wolf is actually present | Shepherd thinks wolf is NOT present (shepherd does nothing) when a wolf is actually present |
Cost Assessment | Costs (actual costs plus shepherd credibility) associated with scrambling the townsfolk to kill the non-existing wolf | Replacement cost for the sheep eaten by the wolf, and replacement cost for hiring a new shepherd |
Note: I added a row called “Cost Assessment.” Since it can not be universally stated that a type I or type II error is worse (as it is highly dependent upon the statement of the null hypothesis), I’ve added this cost assessment to help me understand which error is more “costly” and for which I might want to do more testing.
Let’s look at the classic criminal dilemma next. In colloquial usage, a type I error can be thought of as “convicting an innocent person” and type II error “letting a guilty person go free”.
Null Hypothesis | Type I Error / False Positive | Type II Error / False Negative |
Person is not guilty of the crime | Person is judged as guilty when the person actually did not commit the crime (convicting an innocent person) | Person is judged not guilty when they actually did commit the crime (letting a guilty person go free) |
Cost Assessment | Social costs of sending an innocent person to prison and denying them their personal freedoms (which in our society, is considered an almost unbearable cost) | Risks of letting a guilty criminal roam the streets and committing future crimes |
Let’s look at some business related examples. In these examples I have reworded the null hypothesis, so be careful on the cost assessment.
Null Hypothesis | Type I Error / False Positive | Type II Error / False Negative |
Medicine A cures Disease B | (H_{0} true, but rejected as false)Medicine A cures Disease B, but is rejected as false | (H_{0} false, but accepted as true)Medicine A does not cure Disease B, but is accepted as true |
Cost Assessment | Lost opportunity cost for rejecting an effective drug that could cure Disease B | Unexpected side effects (maybe even death) for using a drug that is not effective |
Let’s try one more.
Null Hypothesis | Type I Error / False Positive | Type II Error / False Negative |
Display Ad A is effective in driving conversions | (H_{0} true, but rejected as false)Display Ad A is effective in driving conversions, but is rejected as false | (H_{0} false, but accepted as true)Display Ad A is not effective in driving conversions, but is accepted as true |
Cost Assessment | Lost opportunity cost for rejecting an effective Display Ad A | Lost sales for promoting an ineffective Display Ad A to your target visitors |
The cost ramifications in the medicine example are quite substantial, so additional testing would likely be justified in order to minimize the impact of the type II error (using an ineffective drug) in our example. However, the cost ramifications in the Display Ad example are quite small, for both the type I and type II errors, so additional investment in addressing the type I and type II errors is probably not worthwhile.
Summary
Type I and type II errors are highly depend upon the language or positioning of the null hypothesis. Changing the positioning of the null hypothesis can cause type I and type II errors to switch roles.
It’s hard to create a blanket statement that a type I error is worse than a type II error, or vice versa. The severity of the type I and type II errors can only be judged in context of the null hypothesis, which should be thoughtfully worded to ensure that we’re running the right test.
I highly recommend adding the “Cost Assessment” analysis like we did in the examples above. This will help identify which type of error is more “costly” and identify areas where additional testing might be justified.
[1] More information about type I and type II errors can be found at: http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
Very thorough. Thanks for the explanation!
Bill,
Great article – keep up the great work and being a nerdy as you can…
😉
Great explanation!!
Thanks so much! I am teaching an undergraduate Stats in Psychology course and have tried dozens of ways/examples but have not been thrilled with any. However I think that these will work! Thanks again!
So this is great and I sharing it to get people calibrated before group decisions. I’m very much a “lay person”, but I see the Type I&II thing as key before considering a Bayesian approach as well…where the outcomes need to sum to 100 %.
I am taking statistics right now and this article clarified something that I needed to know for my exam that is in a couple of hours.
Thank you 🙂
TJ
You should explain that H0 should always be the common stand and against change, eg medicine x does not cure disease y, or display a does not increase conversions. So that in most cases failing to reject H0 normally implies maintaining status quo, and rejecting it means new investment, new policies, which generally means that type 1 error is nornally more expensive and so we set very low alpha 0.1, 0.05, 0.1 ….
Shem, excellent point! Failing to reject H0 means staying with the status quo; it is up to the test to prove that the current processes or hypotheses are not correct. Sort of like innocent until proven guilty; the hypothesis is correct until proven wrong. Thanks for clarifying!
Wonderful, simple and easy to understand
Very thorough… Thanx..
Great explanation !!!
Thank you very much.
Per Dr. Diego Kuonen (@DiegoKuonen), use “Fail to Reject” the null hypothesis instead of “Accepting” the null hypothesis. “Fail to Reject” or “Reject” the null hypothesis (H0) are the 2 decisions. Statistical tests are used to assess the evidence against the null hypothesis.
Thanks a million, your explanation is easily understood.
This was awesome! loved it and I understand more now.
Great exlanation.How can it be prevented.
Well explained, with pakka examples….
Not bad…there’s a subtle but real problem with the “False Positive” and “False Negative” language, though. These terms are commonly used when discussing hypothesis testing, and the two types of errors–probably because they are used a lot in medical testing.
It’s probably more accurate to characterize a type I error as a “false signal” and a type II error as a “missed signal.” When your p-value is low, or your test statistic is outside the rejection area, it is a signal that the data you is very unlikely to represent the status quo. In that case, you reject the null as being, well, very unlikely (and we usually state the 1-p confidence, as well). For example, say our alpha is 0.05 and our p-value is 0.02, we would reject the null and conclude the alternative “with 98% confidence.” If there was some methodological error that led us to conclude this, it would be a type I error…the signal we had would be a false signal. Since it’s convenient to call that rejection signal a “positive” result, it is similar to saying it’s a false positive.
When we don’t have enough evidence to reject, though, we don’t conclude the null. We never “accept” a null hypothesis. If there is an error, and we should have been able to reject the null, then we have missed the rejection signal. It’s not really a false negative, because the failure to reject is not a “true negative,” just an indication we don’t have enough evidence to reject. For example, “no evidence of disease” is not equivalent to “evidence of no disease.”
Rip, thank you very much for the detailed explanations. I think your information helps clarify these two “confusing” terms. Plus I like your examples. Thanks for sharing!
excellent description of the suject.
Very good explanation! Easy to understand!
Thanks Liliana!
Very comprehensive and detailed discussion about statistical errors……..
We used this today for illustration during Lean Six Sigma Green Belt training – good stuff!
Thanks Rich. This is still my most popular blog. Glad I got one of them right!!
WELL ARTICULATED. THANKS ALOT