Trouble viewing the formulas? You need a MathML compatible browser.

# Bayes' theorem

Bayes' Theorem is a result of the probability calculus that allows you to calculate conditional probabilities on the basis of the "reverse" conditional probability. That is, Bayes' Theorem lets you calculate $P\phantom{\rule{0}{0ex}}r\left(A|B\right)$ if you know $P\phantom{\rule{0}{0ex}}r\left(B|A\right)$ and some other information.

Bayes' Theorem is incredibly useful in problems where you learn some new information (gain some new evidence) and have to update probabilities to account for this new information.

The theorem is named after Reverend Thomas Bayes (1702 - 1761), a Presbyterian minister and Fellow of the Royal Society of London.

## Statement of the Theorem

The simplest version of Bayes' Theorem is the following (for any sentences $A,B$):

 $P\phantom{\rule{0}{0ex}}r\left(A|B\right)=\frac{P\phantom{\rule{0}{0ex}}r\left(B|A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(A\right)}{P\phantom{\rule{0}{0ex}}r\left(B\right)}$

This is a simple consequence of Rule 4 of the probability calculus. Rule 4 says that:

$P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)=P\phantom{\rule{0}{0ex}}r\left(A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A\right)$

and

$P\phantom{\rule{0}{0ex}}r\left(B\wedge A\right)=P\phantom{\rule{0}{0ex}}r\left(B\right)\cdot P\phantom{\rule{0}{0ex}}r\left(A|B\right)$

Since $P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)=P\phantom{\rule{0}{0ex}}r\left(B\wedge A\right)$ (by Rule 7), we have:

$P\phantom{\rule{0}{0ex}}r\left(A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A\right)=P\phantom{\rule{0}{0ex}}r\left(B\right)\cdot P\phantom{\rule{0}{0ex}}r\left(A|B\right)$

Dividing each side by Pr(B) gives:

$P\phantom{\rule{0}{0ex}}r\left(A|B\right)=\frac{P\phantom{\rule{0}{0ex}}r\left(A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A\right)}{P\phantom{\rule{0}{0ex}}r\left(B\right)}$

Which is precisely Bayes' Theorem.

## Example

Here is a typical case for Bayes' Theorem. I suspect that my mechanic is a crook, in fact I think it's 25% likely that he is. If he is a crook, there's a 90% chance that he will install a broken muffler in my car. The unconditioned probability that my muffler will be broken is 0.3. Suppose that it is in fact broken; how likely is it that my mechanic actually is a crook?

In this problem, we gain some new evidence (we observe a broken muffler). This should change the probability assigned to "the mechanic is a crook." But how much should it change? That is where Bayes' Theorem comes in.

We know:

$P\phantom{\rule{0}{0ex}}r\left(m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}n\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}c\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}a\phantom{\rule{0.333em}{0ex}}c\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\right)=0.25$

$P\phantom{\rule{0}{0ex}}r\left(m\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}n|m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}n\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}c\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}a\phantom{\rule{0.333em}{0ex}}c\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\right)=0.9$

$P\phantom{\rule{0}{0ex}}r\left(m\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}n\right)=0.3$

We want:

$P\phantom{\rule{0}{0ex}}r\left(m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}n\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}c\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}a\phantom{\rule{0.333em}{0ex}}c\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k|m\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}n\right)$.

By Bayes' Theorem, this is:

$\frac{P\phantom{\rule{0}{0ex}}r\left(m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}n\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}c\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}a\phantom{\rule{0.333em}{0ex}}c\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\right)\cdot P\phantom{\rule{0}{0ex}}r\left(m\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}n|m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}n\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}c\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}a\phantom{\rule{0.333em}{0ex}}c\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\right)}{P\phantom{\rule{0}{0ex}}r\left(m\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}n\right)}$

$=\frac{0.25\cdot 0.9}{0.3}=0.75$

Hence,

$P\phantom{\rule{0}{0ex}}r\left(m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}n\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}c\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}a\phantom{\rule{0.333em}{0ex}}c\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k|m\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}f\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}i\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}n\right)=0.75$.

## Bigger Versions

Often, in Bayes' Theorem problems, the value of the denominator - $P\phantom{\rule{0}{0ex}}r\left(B\right)$ in our statement above - is not given directly. In such cases, the total probability theorem is used. This two-step process can be combined into one, yielding bigger versions of Bayes' Theorem: one corresponding to the simple version of the total probability theorem, and the other to the general version.

 $P\phantom{\rule{0}{0ex}}r\left(A|B\right)=\frac{P\phantom{\rule{0}{0ex}}r\left(B|A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(A\right)}{P\phantom{\rule{0}{0ex}}r\left(A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A\right)+P\phantom{\rule{0}{0ex}}r\left(¬A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|¬A\right)}$

And, if ${A}_{1},{A}_{2},\dots {A}_{n}$ is a partition, then

 $P\phantom{\rule{0}{0ex}}r\left(A|B\right)=\frac{P\phantom{\rule{0}{0ex}}r\left(B|A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(A\right)}{P\phantom{\rule{0}{0ex}}r\left({A}_{1}\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|{A}_{1}\right)+P\phantom{\rule{0}{0ex}}r\left({A}_{2}\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|{A}_{2}\right)+\dots +P\phantom{\rule{0}{0ex}}r\left({A}_{n}\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|{A}_{n}\right)}$

These versions result from substituting the appropriate version of the total probability theorem into the first version of Bayes' Theorem.

## Example

Here's a typical Bayes' Theorem problem. I present you with two urns, labelled 1 and 2. Urn 1 contains 1 red ball and 9 black balls, while urn 2 contains 9 red balls and 1 black ball. I'm going to first choose an urn by rolling a six-sided die: if it comes up 3, I will draw a ball at random from urn 1, anything else and I will draw from urn 2. Without showing you which urn I drew from, I present you with a black ball. How likely is it that I drew from urn 1?

Initially it seems more likely that I drew from urn 1 because it is mainly black balls, but the die-rolling changes things. How much? Let's use Bayes' Theorem.

We are given:

$P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}1\right)=1/6$

$P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}2\right)=5/6$

$P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}k|d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}1\right)=0.9$

$P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}k|d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}2\right)=0.1$

We want:

$P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}1|d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}k\right)$

(probabilities like this are sometimes called "inverse probabilities" as they calculate "backwards." They tell you how likely a past event was given that a present event happened, whereas something like "what's the probability that a black ball is drawn from urn 2" calculates "forwards" from picking the urn to drawing the ball)

According to Bayes' Theorem:

$P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}1|d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}k\right)$

$\frac{P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}k|d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}1\right)\cdot P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}1\right)}{P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}1\right)\cdot P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}k|d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}1\right)+P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}2\right)\cdot P\phantom{\rule{0}{0ex}}r\left(d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}b\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}k|d\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}w\phantom{\rule{0.333em}{0ex}}f\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0.333em}{0ex}}2\right)}$

$=\frac{0.9\cdot 1/6}{0.9\cdot 1/6+0.1\cdot 5/6}$

$=0.265$

So the probability that I drew from urn 1 is only 0.265. If you think about it, this tells you that, since I showed you a black ball, the probability that I rolled a 3 on the six-sided die is 0.265, which is kind of interesting.

Another classic example along these lines is the Monty Hall problem.

## Prior and Posterior Probabilities

Sometimes the result of Bayes' Theorem - $P\phantom{\rule{0}{0ex}}r\left(A|B\right)$ - is called the posterior probability of $A$, while $P\phantom{\rule{0}{0ex}}r\left(A\right)$ is called the prior probability of $A$. This is usually done when the problem involves updating a probability on the basis of some new evidence: $P\phantom{\rule{0}{0ex}}r\left(A\right)$ is the probability of $A$ before learning evidence $B$ (hence "prior") while $P\phantom{\rule{0}{0ex}}r\left(A|B\right)$ is the probability of $A$ after learning evidence $B$ (hence "posterior").

This is particularly relevant when thinking of probabilities as degrees of belief. Bayes' Theorem tells you the right way to update your degrees of belief when you learn new evidence. A prime example of why this is imporant is the medical test.