Trouble viewing the formulas? You need a MathML compatible browser.

# Interpretations of probability

(Redirected from Degree of belief)

We use the word "probability" in many different ways. We say things like "the probability of rain this afternoon is 10%," "the probability of the top card of the deck being an Ace is 0.0714," and "the probability that this patient has cancer is 50/50."

The probability of a sentence is simply a number between 0 and 1 that is assigned to that sentence. There are many ways to understand what this number actually means: these are the different interpretations of probability. We will consider four such interpretations; four ways of thinking about what probabilities are.

## The Classical Interpretation

The oldest interpretation of probability is the Classical Interpretation, an interpretation that grew out of the gambling-related origins of probability theory. Consider the simple example of rolling a six-sided die. How do we calculate the probability of rolling a 3, for example? Well, there is only one 3 on the die, and there are six possible faces that the die could show, so the probability of rolling a 3 is 1/6.

This simple calculation embodies everything there is to the classical interpretation. It tells us to understand the probability of some sentence as the number of outcomes that would make that sentence true divided by the number of total outcomes possible. So the probability of rolling an even number on a six-sided die is 3/6, because there are three outcomes that make this sentence true (rolling a 2, 4, or 6) and there are six total outcomes.

Putting this a little more formally, the classical interpretation tells us that (for some sentence $A$):

$P\phantom{\rule{0}{0ex}}r\left(A\right)=\frac{n\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}b\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}f\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}t\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}t\phantom{\rule{0.333em}{0ex}}m\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0.333em}{0ex}}A\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}e}{n\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}b\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}f\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}t\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}l\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}t\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}s}$

The classical interpretation also offers a way to understand conditional probabilities:

$P\phantom{\rule{0}{0ex}}r\left(A|B\right)=\frac{n\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}b\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}f\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}t\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}t\phantom{\rule{0.333em}{0ex}}m\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0.333em}{0ex}}A\phantom{\rule{0.333em}{0ex}}a\phantom{\rule{0}{0ex}}n\phantom{\rule{0}{0ex}}d\phantom{\rule{0.333em}{0ex}}B\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}e}{n\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}b\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}f\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}t\phantom{\rule{0}{0ex}}c\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}t\phantom{\rule{0.333em}{0ex}}m\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0.333em}{0ex}}B\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}e}$

This looks a little different because, with conditional probabilities, we want to confine our attention to only those cases that make $B$ true. So, for example, if we want to determine the probability of rolling a 2 on a six-sided die given that the outcome of that roll is even, we take the number of outcomes that make "rolled a 2" and "rolled even" true (there is only one such outcome) and divide it by the number of outcomes that male "rolled even" true (there are three of these), not the number of total outcomes (six). This gives the correct answer: 1/3.

It can be shown that this interpretation of probability satisfies the Rules of the Probability Calculus. In fact, this interpretation is frequently used to show why the Rules are correct, but (as is shown below), there are many different ways to understand probability that also follow the Rules.

The classical interpretation suffers from a very important limitation: it only works for events that are equally likely to happen. For example, if the six-sided die is fair, then each of the six outcomes are equally likely to occur. This is common in the gambling-related problems that inspired probability theory: drawing one card is as likely as drawing any other, getting heads on a fair coin is as likely as getting tails, and so on.

But consider what the classical interpretation says about the probability that it will rain this afternoon. Since there are two outcomes (rain and no rain), and one of them makes "it will rain this afternoon" true, the probability that it will rain on any given afternoon is 1/2. This is obviously not right. What went wrong is that the two outcomes (rain and no rain) are not equally likely to happen: in Southern California, for example, it is far more likely that it won't rain this afternoon than it will.

The classical interpretation cannot account for this, it cannot tell us that the probability of rain this afternoon is significantly less than 1/2. Nevertheless, in the cases in which it does apply, the classical interpretation is a simple and helpful way to understand probability.

## Relative Frequency

Another way to understand probability is as a statement of relative frequency. Relative frequency measures how many times a certain event has happened (or how many times a certain sentence has been made true) in a given number of trials. This is best illustrated by an example: if I flip a coin one hundred times, and it landed "heads" seventy-three of those times, then the relative frequency of "heads" is 73/100 or 0.73.

To think of the probability of a sentence as a relative frequency means to think of it as a measure of the ratio of how often that sentence has been made true in relevant trials. More formally,

$P\phantom{\rule{0}{0ex}}r\left(A\right)=\frac{n\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}b\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}f\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}t\phantom{\rule{0.333em}{0ex}}m\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0.333em}{0ex}}A\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}e}{n\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}b\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}f\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}o\phantom{\rule{0}{0ex}}t\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}l\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}s}$

and

$P\phantom{\rule{0}{0ex}}r\left(A|B\right)=\frac{n\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}b\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}f\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}t\phantom{\rule{0.333em}{0ex}}m\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0.333em}{0ex}}A\phantom{\rule{0.333em}{0ex}}a\phantom{\rule{0}{0ex}}n\phantom{\rule{0}{0ex}}d\phantom{\rule{0}{0ex}}B\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}e}{n\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}m\phantom{\rule{0}{0ex}}b\phantom{\rule{0}{0ex}}e\phantom{\rule{0}{0ex}}r\phantom{\rule{0.333em}{0ex}}o\phantom{\rule{0}{0ex}}f\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}i\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}l\phantom{\rule{0}{0ex}}s\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}h\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}t\phantom{\rule{0.333em}{0ex}}m\phantom{\rule{0}{0ex}}a\phantom{\rule{0}{0ex}}k\phantom{\rule{0}{0ex}}e\phantom{\rule{0.333em}{0ex}}B\phantom{\rule{0.333em}{0ex}}t\phantom{\rule{0}{0ex}}r\phantom{\rule{0}{0ex}}u\phantom{\rule{0}{0ex}}e}$

This looks similar to the classical interpretation, but there is an important difference. The classical interpretation counts all possible outcomes, while relative frequency counts all of the outcomes that we have observed (this is why they are called "trials" instead of "outcomes").

This difference allows one to make sense of "the probability that it will rain this afternoon is 0.1" as a relative frequency. If we have observed 1,000 afternoons, and it has rained in 100 of them, then the relative frequency of afternoon rain is 100/1,000 or 0.1.

Just like the classical interpretation, the definitions of probability given above follow the Rules of the Probability Calculus.

Understanding probabilities as relative frequencies is not without its problems. How many trials are necessary to determine the probability of some sentence? If I flip a coin 10 times, and it comes up "heads" 7 times out of those ten, can I conclude that the probability of "heads" is 0.7, and hence the coin is biased? 10 trials seems too low to jump to this conclusion, but is there a "right" number?

The way around this problem is to think of probabilities as relative frequencies in the limit. All this means is: imagine what would happen if you performed an infinite number of trials. In the example of the coin flip, the relative frequency of "heads" in the limit tells you what proportion of "heads" you would see if you continued flipping the coin indefinitely. This limiting relative frequency can be approximated by performing more and more trials. If I get 7 "heads" on 10 flips, then 68 "heads" on 100 flips, then 703 "heads" on 1,000 flips, then 7,102 "heads" on 10,000 flips, I can be reasonably confident that the probability of "heads" is around 0.7.

Another problem with relative frequencies is that they don't make sense for unique statements. What is the probability that Otto, a 73 year old retired German arc-welder who smokes a pack a day, has lung cancer? There is only one Otto, so we only have one "trial;" nor can we think of this probability as a limiting relative frequency, as there will only ever be one Otto. One thing we could do is figure out the relative frequency of smokers in their 70s who have lung cancer and apply that to Otto, but how do we know that this is the right comparison to make? Should we count the number of 73 year old smokers who have lung cancer, or the number of such 73 year old smokers who are also German, or the number of such 73 year old German smokers who are also retired arc-welders? The more specific we get, the less "trials" we have to work with; but if we make the comparison too general we get less accurate information (surely the probability of a 73 year old smoker having lung cancer is different than the probability of any smoker having lung cancer).

Despite these problems, relative frequency is still a helpful way of thinking about probabilities, especially in the case of science where we gain knowledge of the world by repeatedly performing various experiments to test hypotheses.

## Degree of Belief

Sometimes we use probabilities to express our confidence in a sentence: "It's only 15% likely that I will bowl a perfect game tonight." This is a use of probability as degree of belief, a measure of how strongly someone believes something.

Probabilities are numbers between 0 and 1; this reflects the fact that belief is not a "yes/no" deal, it admits of degrees. I believe both that I am going to bowl a perfect game tonight, and that I am presently alive, but I believe the second sentence much more strongly than the first. It would get a higher degree of belief.

Unlike the other interpretations mentioned here, degrees of belief are subjective - they are different for different people - rather than objective - the same for everyone. I may only think I have a 15% chance of bowling a perfect game, but a friend of mine who hasn't seen my recent slump may think I have a 70% chance. Degrees of belief depend upon the information that you know, and thus will be different for different people.

There are two important questions raised by this understanding of probability: How do we assign numbers to beliefs? And why should this assignment follow the probability calculus? The first question is answered by looking at a person's betting behavior, and the second is answered by the Dutch Book arguments.

## Objective Chance

Objective Chance is one of the more subtle interpretations of probability. Thinking of probability in terms of objective chance means thinking about the actual, physical features of the world that determine that probability.

Let's look at an example. When flipping a coin there is a certain probability that the coin will come up "heads." This probability is determined by the physical makeup of the coin and other physical factors: Is the coin heavier on one side than another? How was it flipped? What medium is it travelling to? What is the physical makeup of the surface it lands on? And so on. These are the real features of the world that determine the objective chance that the coin comes up heads.

For this reason, chance is sometimes thought of as a physical concept like "force" or "gravity." These physical concepts determine how our world works, and while we cannot see them directly in action, we can see their effects. It's the same thing with objective chance: the features of an event (the physical makeup of the coin, for example) determine how the coin behaves, and we learn about ojective chance by looking at the resulting behavior of the coin.

Because of this, there is an important connection between objective chance and relative frequency: The objective chance of an event determines the relative frequencies we observe: the physical makeup of the coin (and how its flipped and what it lands on etc...) determines how many "heads" we observe in our sequence of trials, and hence chance determines relative frequency.

Another interesting connection between relative frequency, objective chance, and degrees of belief is this: if you are coherent, and if the trials of an experiment are independent of each other, then as the number of trials increases, your degree of belief that the observed relative frequency approximates the true objective chance approaches 1. That is, the more trials you perform, your confidence that the results you obtain actually are the objective chances behind the trials increases towards complete certainty.