False Positives and False Negatives

Test Says "Yes" ... or does it?

When you have a test that can say "Yes" or "No" (such as a medical test), you have to think:


mother dog thinks wrong pup stole the bone

It is like being told you did something when you didn't!

Or you didn't do it when you really did.

They each have a special name: "False Positive" and "False Negative":

  They say you did They say you didn't
You really did They are right! "False Negative"
You really didn't "False Positive" They are right!

Here are some examples of "false positives" and "false negatives":

But many people don't understand the true numbers behind "Yes" or "No", like in this example:


Example: Allergy or Not?

Hunter says she is itchy. There is a test for Allergy to Cats, but this test is not always right:

Here it is in a table:

  Test says "Yes" Test says "No"
Have allergy 80% 20% "False Negative"
Don't have it 10% "False Positive" 90%


Question: If 1% of the population have the allergy, and Hunter's test says "Yes", what are the chances that Hunter really has the allergy?

Do you think 75%? Or maybe 50%?

A similar test was given to Doctors and most guessed around 75% ...
... but they were very wrong!

(Source: "Probabilistic reasoning in clinical medicine: Problems and opportunities" by David M. Eddy 1982, which this example is based on)

There are three different ways to solve this:

use any you prefer. Let's look at them now:


Try Imagining A Thousand People

When trying to understand questions like this, just imagine a large group (say 1000) and play with the numbers:

As a table:

  1% have it Test says "Yes" Test says "No"
Have allergy 10 8 2
Don't have it 990 99 891
  1000 107 893

So 107 people get a "Yes" but only 8 of those really have the allergy:

8 / 107 = about 7%

So, even though Hunter's test said "Yes", it is still only 7% likely that Hunter has a Cat Allergy.

Why so small? Well, the allergy is so rare that those who actually have it are greatly outnumbered by those with a false positive.


As A Tree

Drawing a tree diagram can really help:

tree diagram test results

First of all, let's check that all the percentages add up:

0.8% + 0.2% + 9.9% + 89.1% = 100% (good!)

And the two "Yes" answers add up to 0.8% + 9.9% = 10.7%, but only 0.8% are correct.

0.8/10.7 = 7% (same answer as above)


Bayes' Theorem

Bayes' Theorem has a special formula for this kind of thing:

P(A|B) = P(A)P(B|A) P(A)P(B|A) + P(not A)P(B|not A)



P(A|B) means "The probability that Hunter actually has the allergy given that the test says Yes"

P(B|A) means "The probability that the test says Yes given that Hunter actually has the allergy"

To be clearer, let's change A to has (actually has allergy) and B to Yes (test says yes):

P(has|Yes) = P(has)P(Yes|has) P(has)P(Yes|has) + P(not has)P(Yes|not has)

And put in the numbers:

P(has|yes) = 0.01×0.8 0.01×0.8 + 0.99×0.1
= 0.0748...

Which is about 7%

Learn more about this at Bayes' Theorem.

One Last Example

Extreme Example: Computer Virus

internet world

A computer virus spreads around the world, all reporting to a master computer.

The good guys capture the master computer and find that a million computers have been infected (but don't know which ones).

Governments decide to take action!

No one can use the internet until their computer passes the "virus-free" test. The test is 99% accurate (pretty good, right?) But 1% of the time it says you have the virus when you don't (a "false positive").

Now let's say there are 1000 million internet users.

So a total of 11 million get banned, but only 1 out of those 11 actually have the virus.

So if you get banned there is only a 9% chance you actually have the virus!


When dealing with false positives and false negatives (or other tricky probability questions) we can use these methods: