![]() |
Chapter 2Data on the Internet |
![]() |
Back to List | Introduction | Analysis | Web Exploration | Weather | Finance | Sports | Exercises
You have learned from your own experiences with the Internet that there is a wealth of information in all sorts of areas. People from the US Government and its various divisions and offices to corporations both large and small to the average individual launch Web sites in an effort to distribute information. This information may come in the form of raw data allowing the reader to analyze and draw his or her own conclusions. In this lesson you will visit three different Web sites containing data in the area of sports, finance and the weather. There you will navigate the sites to collect data while seeing different ways data may be represented and distributed.
Before we take you to the sites, we begin with an important point to keep in mind when computing averages.
Suppose Mr. Peabody's 9:00 statistics class had a mean grade of 85 on the first exam while those in the 10:30 class had a mean of 77. Is it safe to say that the mean grade among all students who takes statistics from Mr. Peabody is
?
That is, the average (mean) of the two class averages (means).
The answer is: possibly, but we need more information.
Suppose we know the 9:00 class has 20 students and the 10:30 class has 30. Then, since the test mean was 85, the sum of the student's test grades in the first class must be ![]()
while for the second class that sum is
.
Combining the classes we have a total of 50 students whose test grades add to 1700 + 2310 = 4010 and the mean test grade for all Mr. Peabody's students is
.
This is the correct combined mean, not the 81 computed earlier.
Now consider Mr. Sherman's two evening statistics classes which contain 5 students each. On his first exam, the students in the first class scored grades of 70, 74, 81, 86, 90 for a mean grade of 80. The students in the second class received grades 63, 66, 71, 74, and 83 giving a mean of 71.4.
The mean for all of Mr. Sherman's students is
![]()
Note this mean does equal the mean of the two class means
.
The difference between Mr. Peabody's classes and Mr. Sherman's is that Mr. Sherman's classes are the same size, both have 5 students. The "average of the averages" calculation is really![]()
which can be seen to be the same as
![]()
by multiplying numerator and denominator by 5.
The moral of the story is that you may compute the mean of a set of data by "averaging the average" (actually finding the mean of means) of subsets of the data only if the subsets are all the same size.
In the Exercises associated with this Project, you will be asked to compute means from means you will find on the Internet. It is safe to assume that these means were all computed using the same number of data points.
The Web sites used in this project are:
The next pages will guide you in locating data at each of these sites and instruct you on which data to collect for the exercises.
The Weather Underground (www.wunderground.com) was created by a student/professor team at the University of Michigan and shortly thereafter became a public company providing current and historical weather data for the United States and other countries.
At the top of the Weather Underground home page you will find a box for entering the name of a city, state or country. Enter your home city or a favorite city to see the current weather conditions there. Note that at the bottom of the current conditions page, there is a box marked Historical Conditions in which you can enter a date to see statistics for any day from the past five years. For example, the data on June 1, 1999 for New Orleans, LA looks like:
|
Mean Temperature |
83.2o F |
|
Max Temperature |
91.0 o F |
|
Min Temperature |
75.9 o F |
|
Cooling Degree Days |
18 |
|
. . . |
. . . |
Experiment with this feature until you are comfortable with finding historical weather data.
Before you leave this Web site, you will want to do the following to prepare for the exercises. Collect the daily Mean Temperature values for the city of San Francisco, CA throughout the month of January in the year 2000. It will take 31 refreshings of the data page but it doesn't take long.
The word finance covers a wide array of topics. In this project we will focus on the stock market. The Internet boom has spawned a number of stock related Web sites where stock data is available, but one of the friendliest such sites remains the finance section of Yahoo.com located at finance.yahoo.com. Yahoo! provides a well-organized catalogue of information on the Internet and was launched by two Stanford students in 1994.
At the address above, you will find current stock market info as well as finance related headlines. Of interest to us is the box at the top of the page labeled Get Quotes. In this box you enter the "ticker symbol" for a company whose stock performance you are interested in. The ticker symbol is a sequence of letters used to identify a company or corporation in stock market indices. Ticker symbols for some well-known companies include:
The site also contains a lookup tool to help you determine the symbol for a company whose name you know. Enter a symbol from the list above or track down the symbol for some other company you are interested in and hit Get Quotes.
You will see information regarding the trading of that company's stock for the most recent weekday on which the market was open. Basic data include the current price per share (in bold), the change in share price this new price represents and the number of shares (volume) that has been traded on that day. Clicking on Detailed View will expand the data to include a graph of the stock's closing price each day over a window of time.
A graph of daily stock price will usually exhibit lots of motion as stock price for even the most solid company can fluctuate on a daily basis. A typical graph might look like

which shows a company's stock rising from about $25/share to $41/share, but the climb, although showing promise, is still rocky with many dips and spikes.
It is essential that stock data be plotted over time. For example, a frequency histogram representation of the above data is

but this graph is missing valuable information. For example, it tells us that for about 13 days the stock price was between $20 and $25 per share but it does not tell us when these days occurred. If they were the most recent 13 days, then we might rethink our investment. If the low prices are a thing of the past, then we can be more optimistic.
Below the graph in a Detailed View, you will find a set of links marked Tables followed by periods (days, weeks, etc.) of various lengths. Clicking on any of these periods generates a table of stock data. At the top of the data there are boxes for setting a specific range of dates for which to see tabled data.
Practice collecting tabular stock data for a company of your choice. In particular, the exercises will have you viewing data for Microsoft (Symbol: MSFT) for the first four months of the year 2000, that is, January 1, 2000 to April 30, 2000.
Sports fans love to talk statistics. Batting average in baseball, points per game in basketball, quarterback rating in football, if you can name it there's a statistic for it. It should then be no surprise that sports related data are easily found on the Internet.
The standard data used to evaluate a player's performance can typically be found at the governing sports organizations Web site, which always have obvious URL names such as www.majorleaguebaseball.com , www.nba.com etc. For more in depth data you may have to seek out a site managed by a genuine sports enthusiast. You will visit one such site here. Don't worry if you're not an enthusiast yourself, the data will be easy to find and understand.
In 1998, Major League Baseball was still suffering with attendance problems as result of an earlier player's strike. As it would turn out, 1998 would be the year that baseball was revitalized in the hearts and minds of its fans largely due to the efforts of Mark McGwire of the St. Louis Cardinals and Sammy Sosa of the Chicago Cubs. At some point in the second half of the season, fans began to realize that one of these two power hitters could break the all-time single season home run record of 61 home runs held by Roger Maris. Fans and non-fans alike picked up on the energy surrounding the great home run chase. Ultimately both men would break the record with Sosa belting 66 balls out of the park and McGwire establishing a new record, an astonishing 70 home runs.
The Web site www.homerunchase.com is dedicated to art of the home run. The maintainers of the site monitor home runs hit during the current season and offer information and factoids regarding home runs past. In particular, the links
will take you to tables that document each home run hit by McGwire and Sosa during this historic season. Among other information you will find the date and distance traveled by each home run. The graph below shows the cumulative number of home runs by each player plotted versus time over the course of the baseball season.

The following exercises will have you interpret and compute with the data in these two tables.
When you've completed each exercise, click "Submit for Grade" in order to submit your answers to your professor.
|
© 2000 by Addison Wesley Longman A division of Pearson Education |