Chapter 5

The Central Limit Theorem

Back to List | Introduction | Background | Analysis | Simulation | Exercises

Introduction

The text has told you about the Central Limit Theorem and how important its use is in the field of statistics. At this point you may not fully understand the theorem or may not even be convinced it is true. After all, it is saying that no matter how skewed, lopsided, or disproportionate a probability distribution might be, all you have to do is randomly select samples that are large enough, find their sample means, and a bell-shaped curve will appear when you construct a histogram of the sample means. This project will use the simple act of rolling a six-sided die to both clear up any confusion you may have regarding the statement of the theorem and further convince you of its truth.

Back to the Top

Background

Consider the act of rolling a standard six-sided die

numbered 1 through 6 as shown. We know the number that appears on top after the roll follows the probability distribution in the table below.

Roll

1 2 3 4 5 6
Probability 1/6 1/6 1/6 1/6 1/6 1/6

The graph of this distribution is anything but bell-shaped

and is, in fact, quite flat.

The corresponding mean and standard deviation of this distribution are

Mean:

Std. deviation:

So what does the Central Limit Theorem say about this distribution? Suppose instead of just rolling a die once and looking at its value, we roll the die a specific number of times and average the values of all the rolls. For example, let's say the die is rolled twice (note this is equivalent to rolling two dice at once) and the mean of the two numbers computed and recorded. We could still record the numbers 1 - 6 (for example, a roll of (1,1) produces an average of 1, a roll of (1,3) or (2,2) gives an average of 2 and so on), but our result may now include numbers such as 1.5 or 4.5 (by rolling (1,2) and (3,6) for instance). The Central Limit Theorem states that if we were to repeat this experiment over and over and plot the probability or frequency distribution for our results, we would see a distribution which was approximately a normal or bell-shaped distribution having approximately the same mean as the distribution for one roll of a die, namely

Mean of averages:
= 3.5

and whose standard deviation is approximately

Std. deviation of averages:

the standard deviation of the one-die distribution divided by the square root of the number of rolls being averaged, in this case 2.

The more rolls you record, the closer the resulting distribution will be to the estimated distribution. Some visual simulations will help.

Back to the Top

Mathematical Analysis Section

Click here to view an animation that simulates the experiment described in the background section, that of rolling two dice and recording the average. The animation shows the distribution growing cumulatively as more rolls are recorded until a total of 5000 rolls is reached.

The net result looks similar to the shape of a normal distribution. At the end of the 5000 experiments, the data had the following frequency distribution.

Average 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Frequency 135 267 407 580 690 845 710 540 393 291 142

from which the mean and standard deviation of the data can be computed to be 3.5029 and 1.203 respectively, values very close to the theoretical values of 3.5 and 1.21 from the Central Limit Theorem.

Click here to view an animation that shows a similar experiment, except here the rolls of 10 dice are averaged and recorded over 5000 repetitions.

As you can see the distribution looks more like a traditional normal curve. One reason for this is that more values are possible to observe. For example, the average of 10 dice can be 1, 1.1, 1.2, ..., 5.9, 6.

These examples should convince you of the basic premise in the Central Limit Theorem, but why would you expect such a theorem to be true? An intuitive argument is one similar to the one we gave in the Probability and Simulation Project. When you roll one die there is no difference between a 1 and a 3. They are just two different sides of the die and are equally likely to turn up. Now suppose we are averaging the values of three dice. Getting a result of 1 is suddenly very special. The only way to get an average of 1 would be for each die to show a value of one. An average of 3 on the other hand, could occur with any of 25 different rolls, including (1,2,6), (3,3,3), (4,2,3) and so is much more likely to be seen than an average of 1. For this reason, values in the middle occur with greater frequency than those at the outer limits, giving the distribution a bell shape.

Back to the Top

Simulation

Java is a special programming language used extensively on the Internet where the programs are designed for specific tasks tied to the material on the page. The programs are referred to as applets and are often interactive. The Department of Statistics at the University of South Carolina has a number of statistics-related Java applets posted on their Web site.

One such Java applet simulates the Central Limit Theorem with dice rolls. The applet requires the user to specify the number of dice rolled (from 1 to 5) and then to specify the number of rolls. Hitting the indicated button will show the resulting distribution. Each hit of the button will add the specified number of rolls to the current distribution in a cumulative fashion, while changing the number of dice zeros everything out for a new run of dice rolls.

At the bottom of this page you will be told how to reach the applet. Spend some time at this site interacting with the applet to reinforce your knowledge of the Central Limit Theorem making sure you understand how your inputs to the applet affect the resulting graph. Note that instead of labeling the horizontal axis with the average of the dice, the author of the applet simply uses the total of the dice. There is really no difference as the axis could be relabeled with averages by simply dividing by the number of dice. The link below will open in a new window. Once you are comfortable with the applet and have had some fun rolling dice millions of time, return to this page by closing that browser window. Then, proceed to the exercises. The applet can be found at http://www.stat.sc.edu/~west/javahtml/CLT.html.

Exercises

When you've finished reviewing the simulation, go on to the exercises below. When you've completed each exercise, click "Submit for Grade" in order to submit your answers to your professor.

1.  

Write down the theoretical distribution for the average of two dice. Compute the mean and standard deviation and compare to the estimates given by the Central Limit Theorem.



2.  

After one experiment where 4 dice were rolled 1,000 times, the observed distribution of averages was seen to be
Average 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5
Frequency 0 5 16 16 21 35 53 95 91 111 118

 

 

 

 

 

 

 

 

 

 

 

 

Average 3.75 4 4.25 4.5 4.75 5 5.25 5.5 5.75 6

 

Frequency 103 92 67 71 50 29 15 8 4 0

 

Compute the mean and standard deviation of this distribution and compare to the estimates given by the Central Limit Theorem.



3.  

Set the parameters for the applet at http://www.stat.sc.edu/~west/javahtml/CLT.html to 2 dice with the number of rolls set to 100. Hit the "Roll the dice" button repeatedly, counting the number of times until the distribution looks approximately normal to you. Repeat this experiment with 5 dice. Which number of dice takes longer to appear normal? Can you explain this behavior?


   


© 2000 by Addison Wesley Longman
A division of Pearson Education