Bayes' Theorem

Bayes can do magic!

Ever wondered how computers learn about people?

shoe laces


An internet search for "movie automatic shoe laces" brings up "Back to the future"

Has the search engine watched the movie? No, but it knows from lots of other searches what people are probably looking for.

And it calculates that probability using Bayes' Theorem.

Bayes' Theorem is a way of finding a probability when we know certain other probabilities.

The formula is:

P(A|B) = P(A) P(B|A)P(B)

Which tells us:   how often A happens given that B happens, written P(A|B),
When we know:   how often B happens given that A happens, written P(B|A)
    and how likely A is on its own, written P(A)
    and how likely B is on its own, written P(B)


Let us say P(Fire) means how often there is fire, and P(Smoke) means how often we see smoke, then:

P(Fire|Smoke) means how often there is fire when we can see smoke
P(Smoke|Fire) means how often we can see smoke when there is fire

So the formula kind of tells us "forwards" P(Fire|Smoke) when we know "backwards" P(Smoke|Fire)

  • dangerous fires are rare (1%)
  • but smoke is fairly common (10%) due to barbecues,
  • and 90% of dangerous fires make smoke

We can then discover the probability of dangerous Fire when there is Smoke:

P(Fire|Smoke) =P(Fire) P(Smoke|Fire)P(Smoke)
=1% x 90%10%

So it is still worth checking out any smoke to be sure.


Example: Picnic Day

You are planning a picnic today, but the morning is cloudy

What is the chance of rain during the day?

We will use Rain to mean rain during the day, and Cloud to mean cloudy morning.

The chance of Rain given Cloud is written P(Rain|Cloud)

So let's put that in the formula:

P(Rain|Cloud) = P(Rain) P(Cloud|Rain)P(Cloud)

P(Rain|Cloud) = 0.1 x 0.50.4  = .125

Or a 12.5% chance of rain. Not too bad, let's have a picnic!

Just 4 Numbers

Imagine 100 people at a party, and you tally how many wear pink or not, and if a man or not, and get these numbers:

bayes table

Bayes' Theorem is based off just those 4 numbers!

Let us do some totals:

bayes table totals

And calculate some probabilities:

puppy rips


And then the puppy arrives! Such a cute puppy.


But all your data is ripped up! Only 3 values survive:

Can you discover P(Man|Pink) ?

Imagine a pink-wearing guest leaves money behind ... was it a man? We can answer this question using Bayes' Theorem:

P(Man|Pink) = P(Man) P(Pink|Man)P(Pink)

P(Man|Pink) = 0.4 × 0.1250.25 = 0.2

Note: if we still had the raw data we could calculate directly 525 = 0.2

Being General

Why does it work?

Let us replace the numbers with letters:

bayes table

Now let us look at probabilities. So we take some ratios:

And then multiply them together like this:

bayes table math P(A)


Now let us do that again but use P(B) and P(A|B):

bayes table math P(B)


Both ways get the same result of ss+t+u+v

So we can see that:

P(B) P(A|B) = P(A) P(B|A)

Nice and symmetrical isn't it?

It actually has to be symmetrical as we can swap rows and columns and get the same top-left corner.

And it is also Bayes Formula ... just divide both sides by P(B):

P(A|B) = P(A) P(B|A)P(B)


First think "AB AB AB" then remember to group it like: "AB = A BA / B"

P(A|B) = P(A) P(B|A)P(B)

Cat Allergy?

One of the famous uses for Bayes Theorem is False Positives and False Negatives.

For those we have two possible cases for "A", such as Pass/Fail (or Yes/No etc)

Example: Allergy or Not?


Hunter says she is itchy. There is a test for Allergy to Cats, but this test is not always right:

If 1% of the population have the allergy, and Hunter's test says "Yes", what are the chances that Hunter really has the allergy?

We want to know the chance of having the allergy when test says "Yes", written P(Allergy|Yes)

Let's get our formula:

P(Allergy|Yes) = P(Allergy) P(Yes|Allergy)P(Yes)

Oh no! We don't know what the general chance of the test saying "Yes" is ...

... but we can calculate it by adding up those with, and those without the allergy:

Let's add that up:

P(Yes) = 1% × 80% + 99% × 10% = 10.7%

Which means that about 10.7% of the population will get a "Yes" result.

So now we can complete our formula:

P(Allergy|Yes) = 1% × 80%10.7%  = 7.48%

P(Allergy|Yes) = about 7%

This is the same result we got on False Positives and False Negatives.

In fact we can write a special version of the Bayes' formula just for things like this:

P(A|B) = P(A)P(B|A) P(A)P(B|A) + P(not A)P(B|not A)

"A" With Three (or more) Cases

We just saw "A" with two cases (A and not A), which we took care of in the bottom line.

When "A" has 3 or more cases we include them all in the bottom line:

P(A1|B) = P(A1)P(B|A1) P(A1)P(B|A1) + P(A2)P(B|A2) + P(A3)P(B|A3) + ...etc

art show

Example: The Art Competition has entries from three painters: Pam, Pia and Pablo

What is the chance that Pam will win First Prize?

P(Pam|First) = P(Pam)P(First|Pam) P(Pam)P(First|Pam) + P(Pia)P(First|Pia) + P(Pablo)P(First|Pablo)

Put in the values:

P(Pam|First) =(15/30) × 4% (15/30) × 4% + (5/30) × 6% + (10/30) × 3%

Multiply all by 30 (makes calculation easier):

P(Pam|First) =15 × 4% 15 × 4% + 5 × 6% + 10 × 3%
=0.60.6 + 0.3 + 0.3

A good chance!

Pam isn't the most successful artist, but she did put in lots of entries.

Now, back to Search Engines.

Search Engines take this idea and scale it up a lot (plus some other tricks).

It makes them look like they can read your mind!

It can also be used for mail filters, music recommendation services and more.