Chapter 5The Central Limit Theorem |
Back to List | Introduction | Background | Analysis | Simulation | Exercises
The text has told you about the Central Limit Theorem and how important its use is in the field of statistics. At this point you may not fully understand the theorem or may not even be convinced it is true. After all, it is saying that no matter how skewed, lopsided, or disproportionate a probability distribution might be, all you have to do is randomly select samples that are large enough, find their sample means, and a bell-shaped curve will appear when you construct a histogram of the sample means. This project will use the simple act of rolling a six-sided die to both clear up any confusion you may have regarding the statement of the theorem and further convince you of its truth.
Consider the act of rolling a standard six-sided die
numbered 1 through 6 as shown. We know the number that appears on top after the roll follows the probability distribution in the table below.
Roll |
1 | 2 | 3 | 4 | 5 | 6 |
Probability | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 |
The graph of this distribution is anything but bell-shaped
and is, in fact, quite flat.
The corresponding mean and standard deviation of this distribution are
Mean:
Std. deviation:
So what does the Central Limit Theorem say about this distribution? Suppose instead of just rolling a die once and looking at its value, we roll the die a specific number of times and average the values of all the rolls. For example, let's say the die is rolled twice (note this is equivalent to rolling two dice at once) and the mean of the two numbers computed and recorded. We could still record the numbers 1 - 6 (for example, a roll of (1,1) produces an average of 1, a roll of (1,3) or (2,2) gives an average of 2 and so on), but our result may now include numbers such as 1.5 or 4.5 (by rolling (1,2) and (3,6) for instance). The Central Limit Theorem states that if we were to repeat this experiment over and over and plot the probability or frequency distribution for our results, we would see a distribution which was approximately a normal or bell-shaped distribution having approximately the same mean as the distribution for one roll of a die, namely
Mean of averages:
= 3.5
and whose standard deviation is approximately
Std. deviation of averages:
the standard deviation of the one-die distribution divided by the square root of the number of rolls being averaged, in this case 2.
The more rolls you record, the closer the resulting distribution will be to the estimated distribution. Some visual simulations will help.
Click here to view an animation that simulates the experiment described in the background section, that of rolling two dice and recording the average. The animation shows the distribution growing cumulatively as more rolls are recorded until a total of 5000 rolls is reached.
The net result looks similar to the shape of a normal distribution. At the end of the 5000 experiments, the data had the following frequency distribution.
Average | 1 | 1.5 | 2 | 2.5 | 3 | 3.5 | 4 | 4.5 | 5 | 5.5 | 6 |
Frequency | 135 | 267 | 407 | 580 | 690 | 845 | 710 | 540 | 393 | 291 | 142 |
from which the mean and standard deviation of the data can be computed to be 3.5029 and 1.203 respectively, values very close to the theoretical values of 3.5 and 1.21 from the Central Limit Theorem.
Click here to view an animation that shows a similar experiment, except here the rolls of 10 dice are averaged and recorded over 5000 repetitions.
As you can see the distribution looks more like a traditional normal curve. One reason for this is that more values are possible to observe. For example, the average of 10 dice can be 1, 1.1, 1.2, ..., 5.9, 6.
These examples should convince you of the basic premise in the Central Limit Theorem, but why would you expect such a theorem to be true? An intuitive argument is one similar to the one we gave in the Probability and Simulation Project. When you roll one die there is no difference between a 1 and a 3. They are just two different sides of the die and are equally likely to turn up. Now suppose we are averaging the values of three dice. Getting a result of 1 is suddenly very special. The only way to get an average of 1 would be for each die to show a value of one. An average of 3 on the other hand, could occur with any of 25 different rolls, including (1,2,6), (3,3,3), (4,2,3) and so is much more likely to be seen than an average of 1. For this reason, values in the middle occur with greater frequency than those at the outer limits, giving the distribution a bell shape.
Java is a special programming language used extensively on the Internet where the programs are designed for specific tasks tied to the material on the page. The programs are referred to as applets and are often interactive. The Department of Statistics at the University of South Carolina has a number of statistics-related Java applets posted on their Web site.
One such Java applet simulates the Central Limit Theorem with dice rolls. The applet requires the user to specify the number of dice rolled (from 1 to 5) and then to specify the number of rolls. Hitting the indicated button will show the resulting distribution. Each hit of the button will add the specified number of rolls to the current distribution in a cumulative fashion, while changing the number of dice zeros everything out for a new run of dice rolls.
At the bottom of this page you will be told how to reach the applet. Spend some time at this site interacting with the applet to reinforce your knowledge of the Central Limit Theorem making sure you understand how your inputs to the applet affect the resulting graph. Note that instead of labeling the horizontal axis with the average of the dice, the author of the applet simply uses the total of the dice. There is really no difference as the axis could be relabeled with averages by simply dividing by the number of dice. The link below will open in a new window. Once you are comfortable with the applet and have had some fun rolling dice millions of time, return to this page by closing that browser window. Then, proceed to the exercises. The applet can be found at http://www.stat.sc.edu/~west/javahtml/CLT.html.
When you've finished reviewing the simulation, go on to the exercises below. When you've completed each exercise, click "Submit for Grade" in order to submit your answers to your professor.
© 2000 by Addison Wesley Longman A division of Pearson Education |