Trouble viewing the formulas? You need a MathML compatible browser.

# Probability space

A Probability space is a useful way of visualizing and understanding probabilities and the rules that govern them. The probability of a sentence is drawn as a shape, with the size of that shape proportional to the probability of the sentence, and any probabilistic relations between sentences are shown as spatial relations between the shapes.

## Basic Probabilities

A probability space begins with a large rectangle that represents all the possible ways the world could be: each point in that rectangle is a specific description of the world. Think of this as a sandbox: the rectangle is the entire sandbox, and every grain of sand is a specific way the world could be. When drawing the shape of some particular sentence, you gather up all of those points that make that sentence true and include them in the shape just as if you were carving a shape in the sandbox.

The way the world actually is (or will be, if we're talking about the probability of a future event) is a single point within the space of all possibilities. Think of this as tossing a pebble (the pebble of fate) into the sandbox. If the pebble has an equal chance of landing anywhere in the box, then the area that a shape takes up in the sand is equal to the probability of the pebble landing there. That is, the area that a shape representing a sentence takes up in the sand is equal to the probability of that sentence being true.

 Pr(A) = the area of the shape that represents A in a probability space

Suppose we want to represent a simple event A with Pr(A) = 0.25. We do this by carving out a shape that takes up 25% of the space. The actual shape that we carve out doesn't matter, only it's area, so we could represent A as either of the following two pictures:

<- Two ways of representing Pr(A) = 0.25 ->

This already lets us picture two rules of the probability calculus:

Rule 1: If A is a tautology, Pr(A) = 1

If A is a tautology, then it must be true no matter what the world is like. That is, every point in the probability space makes A true, so A is represented by the entire space.

Rule 2: If A is a contradiction, Pr(A) = 0

If A is a contradiction then it is always false, no matter what the world is like. So no point in the probabilty space makes A true, hence A is represented by, well, nothing.

If A is a contingency, then it will be represented by a space somewhere in between, and the size of that shape is equal to Pr(A).

Now think about logically equivalent sentences. If A and B are logically equivalent, then they are true in precisely the same situations. Any point in the probability space that makes A true will also make B true, and vice versa. Thus, logically equivalent sentences will be represented by the exact same shape and hence they will have the same area. This explains:

Rule 3: If A, B are logically equivalent, then Pr(A) = Pr(B)

You should also be able to see why the converse of Rule 3 is false (the converse of Rule 3 says that if Pr(A) = Pr(B) then A and B are logically equivalent): two shapes of the same area need not be the same shape:

<- Pr(A) = Pr(B) but A and B are not logically equivalent.

We can also represent mutually exclusive sentences with a probability space. If A and B are mutually exclusive, then there are no situations in which both A and B are true. Such a situation would be a point that lies in the shape of A and the shape of B, so A and B being mutually exclusive entails that their shapes do not overlap:

<- A and B are mutually exclusive, hence there is no overlap.

## Disjunction, Conjunction and Negation

We can represent the logical connectives of disjunction ("or") and conjunction ("and") using probability spaces. First, disjunction.

### Disjunction

For two sentences A, B, their disjunction $A\vee B$ is true whenever A is true or B is true (or both). What this suggests is that the shape carved out by $A\vee B$ should be the shapes carved out by A and B taken together:

<- $A\vee B$ for the (mutually exclusive) sentences A, B from above.

Disjunction, which corresponds to addition of probabilities, is represented by taking the union of the two shapes; it's easy to see that the area taken up by $A\vee B$ is the same as the area of A plus the area of B since that's just what $A\vee B$ is (as long as there is no overlap). This explains:

Rule 4: If A,B are mutually exclusive, then $P\phantom{\rule{0}{0ex}}r\left(A\vee B\right)=P\phantom{\rule{0}{0ex}}r\left(A\right)+P\phantom{\rule{0}{0ex}}r\left(B\right)$

But what if there is overlap? If A, B are not mutually exclusive, then there is a nonempty overlap between them - this overlap is their conjunction, those points that make both A and B true. To correctly calculate the probability of $A\vee B$, we must subtract the area of this overlap as it gets counted twice (once for A, once for B). This is what's happening in:

Rule 6: $P\phantom{\rule{0}{0ex}}r\left(A\vee B\right)=P\phantom{\rule{0}{0ex}}r\left(A\right)+P\phantom{\rule{0}{0ex}}r\left(B\right)-P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)$

Consider two sentences A, B represented as:

<- A and B are not mutually exclusive

To figure out the probability of their disjunction, we cannot simply add the area of A to the area of B because:

To correctly determine $P\phantom{\rule{0}{0ex}}r\left(A\vee B\right)$ we must subtract the overlap that got counted twice:

### Conjunction

As we remarked above, the overlap between two shapes represents their conjunction. $A\wedge B$ is true whenever A is true and B is true, and these points are precisely the overlap of the shapes of A and B:

<- The conjunction of A and B is represented by their overlap.

This provides the probability space version of the rules for conjunction:

Rule 7: For any sentences $A,B,P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)=P\phantom{\rule{0}{0ex}}r\left(A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A\right)$ Rule 8: If A and B are independent, $P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)=P\phantom{\rule{0}{0ex}}r\left(A\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B\right)$

We'll talk about independence below, for now just realize that conjunction, which corresponds to multiplication of probabilities, is represented by taking the overlap of the two shapes for both independent and relevant events A and B.

### Negation

Finally, negation. $¬A$ is true precisely when A is false, so it should be represented in a probability space by the collection of all those points that aren't included in A's shape. The negation of A is therefore represented by "everything else:"

Since the rectangle has area 100%, the area of $¬A$ is just the whole space minus the area of A. This immediately suggests:

Rule 5: $P\phantom{\rule{0}{0ex}}r\left(¬A\right)=1-P\phantom{\rule{0}{0ex}}r\left(A\right)$

## Conditional Probabilities

Recall that the conditional probability Pr(A | B) is the probability that A is true given that B is true. The "given that" means we should confine ourselves to the shape carved out by B. Since we are given that B is true, we know that the world can't be a point outside of B, so we are concerned only with those points in B. Of these points, which ones make A true? Precisely those that live in A&B:

So what's the difference between $P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)$ and $P\phantom{\rule{0}{0ex}}r\left(A|B\right)$? $P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)$ is the area that $A\wedge B$ takes up in the total space, while $P\phantom{\rule{0}{0ex}}r\left(A|B\right)$ is the area that $A\wedge B$ takes up in $B$. This is an important difference: while both $P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)$ and $P\phantom{\rule{0}{0ex}}r\left(A|B\right)$ are represented by the same shape in probability space, the way we calculate their areas (and hence the probabilities!) is fundamentally different. This shows itself in the following consequence of Rule 7:

$P\phantom{\rule{0}{0ex}}r\left(A|B\right)=\frac{P\phantom{\rule{0}{0ex}}r\left(A\wedge B\right)}{P\phantom{\rule{0}{0ex}}r\left(B\right)}$

 $P\phantom{\rule{0}{0ex}}r\left(A|B\right)$ is the ratio of the area of $A\wedge B$ to the area of $B$

If A and B are independent, that means Pr(A | B) = Pr(A) and Pr(B | A) = Pr(B). To represent independent events then, we would need two shapes such that the area $A\wedge B$ takes up in B is the same as the area that A takes up in the total space, and the area that $A\wedge B$ takes up in in A is the same as the area that B takes up in the total space. It turns out that there's a neat way to do this: represent A and B with rectangles and make sure that the edges of the rectangle are perpendicular (they form a 90 degree angle).

(Independent events need not be rectangular, it's just easier to guarantee independence if we draw them as perpendicular rectangles)

## Total Probabilitiy

Probability spaces make it easy to see just what exactly the total probability theorem does, and why it got its name. First, consider how partitions are represented in probability spaces.

### Partitions

Recall that a partition is a collection of sentences that are mutually exclusive and exhaustive. At least one of the sentences must always be true, and no more than one can be true at any given time.

Consider a partition of four sentences labelled A1, A2, A3 and A4. These sentences are mutually exclusive, so we know that there can be no overlap between any of them. They are also exhaustive, meaning that at least one of them must always be true. That is, there is no point in the probability space that does not make any of the four sentences true. The four shapes must take up the entire space, they must exhaust the entire space:

<- A representation of the partition A1, A2, A3 and A4.

### Total Probability

With the idea of a partition in hand, we can picture the total probability theorem. This theorem lets you determine the probability of a simple sentence if we know the partial probabilities of that sentence conditioned on the elements of a partition.

So let's take some completely random sentence B:

<- We will determine Pr(B) using total probability

The conditional probabilities Pr(B | A1), Pr(B | A2), Pr(B | A3), and Pr(B | A4) are represented as follows (compare the previous two figures):

<- Pr(B) distributed over the partition A1, A2, A3, A4

(These four colored shapes actually represent $P\phantom{\rule{0}{0ex}}r\left(A1\wedge B\right)$, $P\phantom{\rule{0}{0ex}}r\left(A2\wedge B\right)$, $P\phantom{\rule{0}{0ex}}r\left(A3\wedge B\right)$ and $P\phantom{\rule{0}{0ex}}r\left(A4\wedge B\right)$, but remember from the above discussion of conditional probabilities, $P\phantom{\rule{0}{0ex}}r\left(A3\wedge B\right)$ shows up as the same shape as $P\phantom{\rule{0}{0ex}}r\left(B|A3\right)$. The only difference, and it's an important one, is how we calculate the areas, and hence probabilities, of this shape. If it's compared to the entire space, that gives us $P\phantom{\rule{0}{0ex}}r\left(A3\wedge B\right)$; if it's compared to just $A3$, that gives us $P\phantom{\rule{0}{0ex}}r\left(B|A3\right)$.)

If we knew the area of these four colored shapes, how would we determine the area of B? Simply by adding them together:

This is precisely what's happening in the total probability theorem:

$P\phantom{\rule{0}{0ex}}r\left(B\right)=P\phantom{\rule{0}{0ex}}r\left(A1\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A1\right)+P\phantom{\rule{0}{0ex}}r\left(A2\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A2\right)+P\phantom{\rule{0}{0ex}}r\left(A3\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A3\right)+P\phantom{\rule{0}{0ex}}r\left(A4\right)\cdot P\phantom{\rule{0}{0ex}}r\left(B|A4\right)$

The total probability of B is the sum of all of these parts, hence the name of the theorem.