4.3 Model Assignment
Let us take a look at the coin-toss experiment more closely. What do we mean when we say “the
probability of Heads" or write lP(Heads)? Given a coin and an itchy thumb. how do we go about
finding what lP(Heads) should be?
4.3.1 The Measure Theory Approach
This approach states that the way to handle lP(Heads) is to define a mathematical function. called
a probability measure, on the sample space. Probability measures satisfy certain axioms (to be
introduced later) and have special mathematical properties, so not just any mathematical function
will do. But in any given physical circumstance there are typically all sorts of probability measures
from which to choose. and it is left to the experimenter to make a reasonable choice - usually based
on considerations of objectivity. For the tossing coin example. a valid probability measure assigns
probability p to the event {Heads}. where p is some number 0 ≤ p ≤ l. An experimenter that
wishes to incorporate the symmetry of the coin would choose p = l/2 to balance the likelihood of
{Heads} and {Tails}.
Once the probability measure is chosen (or determined), there is not much left to do. All
assignments of probability are made by the probability function. and the experimenter needs only to
plug the event {Heads} into to the probability function to find lP(Heads). In this way, the probability
of an event is simply a calculated value. nothing more, nothing less. Of course this is not the whole
story; there are many theorems and consequences associated with this approach that will keep
us occupied for the remainder of this book. The approach is called measure theory because the
measure (probability) of a set (event) is associated with how big it is (how likely it is to occur).
The measure theory approach is well suited for situations where there is symmetry to the exper-
iment. such as flipping a balanced coin or spinning an arrow around a circle with well-defined pie
slices. It is also handy because of its mathematical simplicity. elegance. And flexibility. There are
literally volumes of information that one can prove about probability measures. and the cold rules
of mathematics allow us to analyze intricate probabilistic problems with vigor.
The large degree of flexibility is also a disadvantage. however. When symmetry fails it is
not always obvious what an "objective" choice of probability measure should be; for instance,
what probability should we assign to {Heads} if we spin the coin rather than flip it? (It is not
1/2.) Furthermore. the mathematical rules are restrictive when we wish to incorporate subjective
knowledge into the model. knowledge which changes over time and depends on the experimenter,
such as personal knowledge about the properties of the specific coin being flipped, or of the person
doing the flipping.
The mathematician who revolutionized this way to do probability theory was Andrey Kol-
mogorov, who published a landmark monograph in 1933. See
http : //www- history . mcs . st— andrews . ac .uk/Mathematicians/Kolmogorov .html
for more information.
4.3.2 Relative Frequency Approach
This approach states that the way to determine lP(Heads) is to flip the coin repeatedly. in exactly the same way each time. Keep a tally of the number of flips and the number of Heads observed. Then a good approximation to lP(Heads) will be
lP(Heads) ≈(number of observed Heads)/(total number of flips)
The mathematical underpinning of this approach is the celebrated Law of Large Numbers. which may be loosely described as follows. Let E be a random experiment in which the event A either does or does not occur. Perform the experiment repeatedly, in an identical manner. in such a way that the successive experiments do not influence each other. After each experiment. Keep a running tally of whether or not the event A occurred. Let S_n count the number of times that A occurred in the n experiments. Then the law of large numbers says that
S_n/n→lP(A) as n→ ∞
As the reasoning goes. to learn about the probability of an event A we need only repeat the random experiment to get a reasonable estimate of the probability’s value, and if we are not satisfied with our estimate then we may simply repeat the experiment more times all the while confident that with more and more experiments our estimate will stabilize to the true value.
The frequentist approach is good because it is relatively light on assumptions and does not worry about symmetry or claims of objectivity like the measure-theoretic approach does. lt is perfect for the spinning coin experiment. One drawback to the method is that one can never know the exact value of a probability. only a long-run approximation. It also does not work well with experiments that can not be repeated indefinitely. say. the probability that it will rain today. The chances that you get will get an A in your Statistics class. or the probability that the world is destroyed by nuclear war.
This approach was espoused by Richard von Mises in the early twentieth century. and some of his main ideas were incorporated into the measure theory approach. See
http : //www—history . mcs . st— andrews . ac .uk/Biographies/Mises.htrrt1 for more.
4.3.3 The Subjective Approach
The subjective approach interprets probability as the experimenter’s degree of belief that the event will occur. The estimate of the probability of an event is based on the totality of the individual’s knowledge at the time. As new information becomes available. The estimate is modified accordingly to best reflect his/her current knowledge. The method by which the probabilities are updated is commonly done with Bayes" Rule. discussed in Section 4.8.
So for the coin toss example. a person may have lP(Heads) = l/Z in the absence of additional information. But perhaps the observer knows additional information about the coin or the thrower that would shift the probability in a certain direction. For instance, parlor magicians may be trained to be quite skilled at tossing coins. and some are so skilled that they may toss a fair coin and get nothing but Heads. indefinitely. I have seen this. lt was similarly claimed in Bringing Down the House [65] that MIT students were accomplished enough with cards to be able to cut a deck to the same location. every single time. In such cases. one clearly should use the additional information to assign lP(Heads) away from the symmetry value of l/2.
This approach works well in situations that cannot be repeated indefinitely, for example, to assign your probability that you will get an A in this class. the chances of a devastating nuclear war. or the likelihood that a cure for the common cold will be discovered. The roots of subjective probability reach back a long time. See
http : //en . wikipedia . org/wiki/Subj ective_probabi1ity
for a short discussion and links to references about the subjective approach.
4.3.4 Equally Likely Model(ELM)
We have seen several approaches to the assignment of a probability model to a given random experiment and they are very different in their underlying interpretation. But they all cross paths when it comes to the equally likely model which assigns equal probability to all elementary outcomes of the experiment.
The ELM appears in the measure theory approach when the experiment boasts symmetry of some kind. If symmetry guarantees that all outcomes have equal “size” ,and if outcomes with equal “size” should get the same probability, then the ELM is a logical objective choice for the experimenter. Consider the balanced 6-sided die, the fair coin, or the dart board with equal-sized wedges.
The ELM appears in the subjective approach when the experimenter resorts to indifference or ignorance with respect to his/her knowledge of the outcome of the experimenter has on prior knowledge to suggest that (s) he prefer Heads over Tails, then it is reasonable for the him/her to assign equal subjective probability to both possible outcomes.
The ELM appears in the relative frequency approach as a fascinating fact of Nature: when we flip balanced coins over and over again, we observe that the proportion of times that coin comes up Heads tends to 1/2. Of course if we assume that the measure theory applies then we can prove that the sample proportion must tend to 1/2 as expected, but that is putting the cart before the horse, in a manner of speaking.
The ELM is only available when there are finitely many elements in the sample space.
4.3.5 How to do it with R
In the prob package, a probability space is an object space is an object of outcomes S and a vector of probabilities (called “probs”) with entries that correspond to each outcome in S. When S is a data frame, we may simply add a column called probs to S and we will be finished; the probability space will simply be a data frame which we may call S. In the case that S is a list, we may combine the outcomes and probs into a larger list, space; it will have two components: outcomes and probs. The only requirements we need are for the entries of probs to be nonnegative and sum (probs) to be one.
To accomplish this in R, we may use the probspace function. The general syntax is probspace (x,probs),where x is a sample space of outcomes and probs is a vector ( of the same length as the number of outcomes in x ). The specific choice of probs depends on the context of the problem, and some examples follow to demonstrate some of the more common choices.
Example 4.4. The Equally Likely Model asserts that every outcome of the sample space has the Same probability, thus, if a sample space has n outcomes, then probs would be a vector of length n with identical entries l/n. The quickest way to generate probs is with the rep function. We will start with the experiment of rolling a die, so that n = 6. We will construct the sample space, generate the probs ve