Big Data

Understanding Type I and Type II Errors

Bill Schmarzo By Bill Schmarzo CTO, Dell EMC Services (aka “Dean of Big Data”) September 16, 2013

I recently got an inquiry that asked me to clarify the difference between type I and type II errors when doing statistical testing.  Let me use this blog to clarify the difference as well as discuss the potential cost ramifications of type I and type II errors.  I have also provided some examples at the end of the blog[1].

In statistical test theory, the notion of statistical error is an integral part of hypothesisstats testing. The statistical test requires an unambiguous statement of a null hypothesis (H0), for example, “this person is healthy”, “this accused person is not guilty” or “this product is not broken”.   The result of the test of the null hypothesis may be positive (healthy, not guilty, not broken) or may be negative (not healthy, guilty, broken).

If the result of the test corresponds with reality, then a correct decision has been made (e.g., person is healthy and is tested as healthy, or the person is not healthy and is tested as not healthy).  However, if the result of the test does not correspond with reality, then two types of error are distinguished: type I error and type II error.

Type I Error (False Positive Error)

A type I error occurs when the null hypothesis is true, but is rejected.  Let me say this again, a type I error occurs when the null hypothesis is actually true, but was rejected as false by the testing.

A type I error, or false positive, is asserting something as true when it is actually false.  This false positive error is basically a “false alarm” – a result that indicates a given condition has been fulfilled when it actually has not been fulfilled (i.e., erroneously a positive result has been assumed).

BoycriedwolfbarlowLet’s use a shepherd and wolf example.  Let’s say that our null hypothesis is that there is “no wolf present.”  A type I error (or false positive) would be “crying wolf” when there is no wolf present.  That is, the actual condition was that there was no wolf present; however, the shepherd wrongly indicated there was a wolf present by calling “Wolf! Wolf!”  This is a type I error or false positive error.

Type II Error (False Negative)

A type II error occurs when the null hypothesis is false, but erroneously fails to be rejected.  Let me say this again, a type II error occurs when the null hypothesis is actually false, but was accepted as true by the testing.

A type II error, or false negative, is where a test result indicates that a condition failed, while it actually was successful.   A Type II error is committed when we fail to believe a true condition.

9102262065_152a10d650_b

Candy Crush Saga

Continuing our shepherd and wolf example.  Again, our null hypothesis is that there is “no wolf present.”  A type II error (or false negative) would be doing nothing (not “crying wolf”) when there is actually a wolf present.  That is, the actual situation was that there was a wolf present; however, the shepherd wrongly indicated there was no wolf present and continued to play Candy Crush on his iPhone.  This is a type II error or false negative error.

A tabular relationship between truthfulness/falseness of the null hypothesis and outcomes of the test can be seen in the table below:

Null Hypothesis is true Null hypothesis is false
Reject null hypothesis Type I ErrorFalse Positive Correct OutcomeTrue Positive
Fail to reject null hypothesis Correct outcomeTrue Negative Type II ErrorFalse Negative

Examples

Let’s walk through a few examples and use a simple form to help us to understand the potential cost ramifications of type I and type II errors.  Let’s start with our shepherd/wolf example.

Null Hypothesis Type I Error / False Positive Type II Error / False Negative
Wolf is not present Shepherd thinks wolf is present (shepherd cries wolf) when no wolf is actually present Shepherd thinks wolf is NOT present (shepherd does nothing) when a wolf is actually present
Cost Assessment Costs (actual costs plus shepherd credibility) associated with scrambling the townsfolk to kill the non-existing wolf Replacement cost for the sheep eaten by the wolf, and replacement cost for hiring a new shepherd

Note: I added a row called “Cost Assessment.”  Since it can not be universally stated that a type I or type II error is worse (as it is highly dependent upon the statement of the null hypothesis), I’ve added this cost assessment to help me understand which error is more “costly” and for which I might want to do more testing.

Let’s look at the classic criminal dilemma next.  In colloquial usage, a type I error can be thought of as “convicting an innocent person” and type II error “letting a guilty person go free”.

Null Hypothesis Type I Error / False Positive Type II Error / False Negative
Person is not guilty of the crime Person is judged as guilty when the person actually did not commit the crime (convicting an innocent person) Person is judged not guilty when they actually did commit the crime (letting a guilty person go free)
Cost Assessment Social costs of sending an innocent person to prison and denying them their personal freedoms (which in our society, is considered an almost unbearable cost) Risks of letting a guilty criminal roam the streets and committing future crimes

Let’s look at some business related examples.  In these examples I have reworded the null hypothesis, so be careful on the cost assessment.

Null Hypothesis Type I Error / False Positive Type II Error / False Negative
Medicine A cures Disease B (H0 true, but rejected as false)Medicine A cures Disease B, but is rejected as false (H0 false, but accepted as true)Medicine A does not cure Disease B, but is accepted as true
Cost Assessment Lost opportunity cost for rejecting an effective drug that could cure Disease B Unexpected side effects (maybe even death) for using a drug that is not effective

Let’s try one more.

Null Hypothesis Type I Error / False Positive Type II Error / False Negative
Display Ad A is effective in driving conversions (H0 true, but rejected as false)Display Ad A is effective in driving conversions, but is rejected as false (H0 false, but accepted as true)Display Ad A is not effective in driving conversions, but is accepted as true
Cost Assessment Lost opportunity cost for rejecting an effective Display Ad A Lost sales for promoting an ineffective Display Ad A to your target visitors

The cost ramifications in the medicine example are quite substantial, so additional testing would likely be justified in order to minimize the impact of the type II error (using an ineffective drug) in our example.  However, the cost ramifications in the Display Ad example are quite small, for both the type I and type II errors, so additional investment in addressing the type I and type II errors is probably not worthwhile.

Summary

Type I and type II errors are highly depend upon the language or positioning of the null hypothesis. Changing the positioning of the null hypothesis can cause type I and type II errors to switch roles.

It’s hard to create a blanket statement that a type I error is worse than a type II error, or vice versa.  The severity of the type I and type II errors can only be judged in context of the null hypothesis, which should be thoughtfully worded to ensure that we’re running the right test.

I highly recommend adding the “Cost Assessment” analysis like we did in the examples above.  This will help identify which type of error is more “costly” and identify areas where additional testing might be justified.


[1] More information about type I and type II errors can be found at: http://en.wikipedia.org/wiki/Type_I_and_type_II_errors

Bill Schmarzo

About Bill Schmarzo


CTO, Dell EMC Services (aka “Dean of Big Data”)

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Read More

Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *

30 thoughts on “Understanding Type I and Type II Errors

  1. Thanks so much! I am teaching an undergraduate Stats in Psychology course and have tried dozens of ways/examples but have not been thrilled with any. However I think that these will work! Thanks again!

  2. So this is great and I sharing it to get people calibrated before group decisions. I’m very much a “lay person”, but I see the Type I&II thing as key before considering a Bayesian approach as well…where the outcomes need to sum to 100 %.

  3. I am taking statistics right now and this article clarified something that I needed to know for my exam that is in a couple of hours.
    Thank you 🙂

    TJ

  4. You should explain that H0 should always be the common stand and against change, eg medicine x does not cure disease y, or display a does not increase conversions. So that in most cases failing to reject H0 normally implies maintaining status quo, and rejecting it means new investment, new policies, which generally means that type 1 error is nornally more expensive and so we set very low alpha 0.1, 0.05, 0.1 ….

  5. Shem, excellent point! Failing to reject H0 means staying with the status quo; it is up to the test to prove that the current processes or hypotheses are not correct. Sort of like innocent until proven guilty; the hypothesis is correct until proven wrong. Thanks for clarifying!

  6. Per Dr. Diego Kuonen (‏@DiegoKuonen), use “Fail to Reject” the null hypothesis instead of “Accepting” the null hypothesis. “Fail to Reject” or “Reject” the null hypothesis (H0) are the 2 decisions. Statistical tests are used to assess the evidence against the null hypothesis.

  7. Not bad…there’s a subtle but real problem with the “False Positive” and “False Negative” language, though. These terms are commonly used when discussing hypothesis testing, and the two types of errors–probably because they are used a lot in medical testing.
    It’s probably more accurate to characterize a type I error as a “false signal” and a type II error as a “missed signal.” When your p-value is low, or your test statistic is outside the rejection area, it is a signal that the data you is very unlikely to represent the status quo. In that case, you reject the null as being, well, very unlikely (and we usually state the 1-p confidence, as well). For example, say our alpha is 0.05 and our p-value is 0.02, we would reject the null and conclude the alternative “with 98% confidence.” If there was some methodological error that led us to conclude this, it would be a type I error…the signal we had would be a false signal. Since it’s convenient to call that rejection signal a “positive” result, it is similar to saying it’s a false positive.
    When we don’t have enough evidence to reject, though, we don’t conclude the null. We never “accept” a null hypothesis. If there is an error, and we should have been able to reject the null, then we have missed the rejection signal. It’s not really a false negative, because the failure to reject is not a “true negative,” just an indication we don’t have enough evidence to reject. For example, “no evidence of disease” is not equivalent to “evidence of no disease.”

  8. Rip, thank you very much for the detailed explanations. I think your information helps clarify these two “confusing” terms. Plus I like your examples. Thanks for sharing!

      • Excuse me how I know that the number hypothesis is true or false
        & Is the value of type 1 error when accepting Ho is zero ?
        Waiting for your answer please reply as fast as you can , thanks.

  9. In the case of the medicine example, the null hypothesis would not be written as “medicine A cures disease B”. The null hypothesis (H0) would be that “there is no difference in cure rate between medicine A and a placebo”. The null hypothesis is just that: null meaning “nothing” or “no difference”, and the routine statistical approach is to try to disprove H0, not prove that the opposite must be true. But a great article! Thanks.

    • Thanks Tony for the clarification. Your clarification makes the article more useful, so thanks for your contribution.

      I’ve always struggled with how to communicate the null hypotheses – the hypothesis of proving “there is no difference”. To me the challenge is that proving “there is no difference” doesn’t necessarily tell me that “there is a difference”. It seems incomplete that while I can prove “there is no difference” I can’t likewise prove that there is a difference (because I can’t assume that proving there is no difference consequently mean that there is a difference). Does that make sense?

      That’s why I instead focused on articulating the costs of Type I and Type II errors – which is more concrete and actionable for my clients in understanding the ramifications of the decisions that they are trying to make.

      Thanks again for your contribution.