

If you reviewed the topics of discrete and continuous probability
distributions earlier in this course, you are already familiar with
some of the concepts that will be covered in this discussion, such as
mean, variance, and standard deviation. The previous sections of the
course showed you how these summary measures are calculated for
populations with the data presented in the form of a probability
distribution. As there are different formulas for the mean of a
probability distribution, a population, and a sample, it is crucial
that you use the correct formula for the type of data you are
measuring. This portion of the course reviews summary measures for
populations and samples. This section concludes with a demonstration of
a practical application of sample statistics through the development of
confidence intervals.
Population Data To demonstrate the calculation of
summary measures for a population, this course will use as an example
the population of employees in the accounting department of a small
company. The random variable of interest is the tenure of the five
employees of this accounting department. The table below lists the
names of the employees along with the number of years they have worked
in that department.

How does a marketing brand manager use statistics? 
Population measures of central tendency What can you infer
from this population data? In its raw form, it is difficult to draw
many conclusions. When it is organized you can begin to summarize
information. One common way to summarize population data is by using
the measures of central tendency: mode, median, and mean.

Symbols of Populations, Samples, and Probability Distributions

Population mode The mode is simply the value in the population
that occurs most frequently. A population can have more than one mode.
In the accounting department example, there are two employees who each
have tenure of two years. This is the value with the highest
frequency, so you can say that the modal value of tenure for this
population is two.

You can use Excel to find population and sample modes.
Mode Excel Tutorial

Population median The median is the halfway point of a
population, where the values of the random variable have been arranged
in numerical order. If a population has an odd number of data points,
the median is a single point. If the population has an even number of
values, you would find the median by adding the two central values and
dividing the sum by two. In the accounting department example, there
are five employees or data points. Look at the table below where the
employees have been arranged in ascending order of the years of tenure.
The midway point is the third value, which is seven.

You can use Excel to find population and sample medians.
Median Excel Tutorial

Population mean Mean is the most commonly used measure of
central tendency. If you reviewed "Discrete Probability Distributions"
and "Continuous Probability Distributions" earlier in the course, you
should already be familiar with the concept of mean. The formula for
calculating the mean of a population is shown below.
m_{x} = population mean of the random variable x
N = number of occurrences of the random variable x in the population
i = an instance of the random variable x
x_{i} = the ith value of the random variable x
To see how the mean of a population is calculated, return to the
accounting department example. Using the information from the
population, the values for the formula variables are
m_{x} = population mean
N = 5
x_{i} = 2, 15, 7, 8 and 2 (tenure values)
To find the mean of the tenure of the accounting department, add
together all of the tenure values, and then divide that sum by the
total number of employees in the department, as shown below.
The mean tenure of the accounting department is 6.8 years. To
determine the level of variability in tenure in the department, you
will need to calculate the variance and standard deviation of this
population.
Population measures of dispersion Variance and standard
deviation are measures of dispersion. They tell you to what degree a
population's values are scattered around its mean.

You can use Excel to find population and sample means.
Mean Excel Tutorial

Population variance Variance is a measure of the spread of
data around the mean. Like other summary measure formulas, the formula
used to calculate variance depends on whether you are measuring a
probability distribution, a sample, or a population. The formula for a
population variance is shown below.
s_{x}^{2} = population variance of the random variable x
m_{x} = population mean of the random variable x
N = number of occurrences of the random variable x in the population
i = an instance of the random variable
x_{i} = ith instance of the random variable x
To find the variance of a population, the mean of the population is subtracted from each value of x.
This difference is then squared so as to avoid negative values
canceling each other out. Each squared difference is then summed and
that sum is divided by N. Consider the accounting department example again.
s_{x}^{2} = population variance
m_{x} = 6.8
N = 5
x_{i} = 2, 15, 8, 7 and 2
Using the above information, the variance of this population would be calculated in the following manner.
Population standard deviation The standard deviation of a
population is the square root of the population variance. Therefore,
the formula for population standard deviation can be written as
Because population standard deviation is the square root of the
variance, the variables in the formula are the same. (These variables
were listed in the variance portion of this discussion.) The variance
of the accounting department tenure was determined to be 22.96. The
standard deviation of this population is the square root of this number.
The standard deviation of this population is 4.79 years. Standard
deviation is a more useful measure of dispersion than variance because
its unit value is in the same terms as that of the population from
which it came. The variance in this example is in the terms of years
squared and as such is not useful. The standard deviation can be
associated with the mean to understand the general level of dispersion
of a population. In this example the mean years of tenure is 6.8 years
with a standard deviation of 4.79 years.
Sample Data
When a population's entire data set is available, it is not necessary to sample the data set in order to infer population parameters.
Frequently, however, it is impossible to collect data from an entire
population. In these cases a sample from the population may be drawn,
its characteristics measured, and an estimation of the parameters of
the population developed.
To understand the concepts of statistical estimation,
imagine yourself as a designer of new products for a manufacturer of
lightweight, waterproof hiking jackets. Earlier designs of the jackets
were built with inexpensive zippers that tended to break. These zippers
caused a high level of customer dissatisfaction. In the end, the money
spent replacing broken zippers overshadowed the initial cost savings
projected for the inexpensive zippers.
Your challenge is to find another zipper that is relatively
inexpensive, yet durable. For the purposes of this example, assume that
zippers are subjected to a stress test in your company's lab and that
stress is quantified in terms of a "test weight." Your design
specifications call for zippers with a test weight of 10; any zipper
with a test weight below 7 is considered to be below specification.
You have chosen a new zipper and tested a sample of 100 to ensure
that they meet the required test weights. The results from the test
are presented in the frequency table below.
Your sample of 100 randomly selected zippers is large enough to be
considered representative of the population of zippers that will be
used for this season's jackets. A histogram
of this sample illustrates the shape of the sample distribution. Since
the sample size is sufficiently large, the histogram provides an
approximate estimation of the shape of the population distribution.
Sample measures of central tendency Sample measures of
central tendency—mode, median, and mean—help business analysts and
researchers to understand the population from which the sample was
drawn.
Sample mode Recall that the mode is simply the value in the
distribution that occurs most frequently. In the zipper sample, the
test weights 12 and 13 are the modes because they occur 9 times each,
which is more often than any other value.
Sample median Recall that the median is the halfway point of a
distribution that is arranged in numerical order. If the distribution
has an odd number of data points, the median is a single point. In the
zipper sample, there are 100 data points in all. The midway point is
between the fiftieth and fiftyfirst value, both of which equal 12. To
find the median, you would add 12 to 12 and divide the result in half.
The result is 12.
Sample mean Recall that the mean, the most commonly used
measure of central tendency, is simply the average of the values of a
random variable in a sample. The formula used to calculate sample mean
is quite similar to the one used to calculate the mean of a population.
However, two variables are different. The first, the variable which denotes the sample mean, is used in place of
m, the variable for population mean. The second difference, n, the number of occurrences of the random variable x in the sample, is used in place of the population variable N, the number of occurrences of the random variable x in the population.
The formula for calculating the mean of a sample is shown below.

You can use Excel to find population variance and standard deviation.
Population Variance and Standard Deviation Excel Tutorial

= sample mean of the random variable x
n = number of occurrences of the random variable x in the sample
i = an instance of the random variable
x_{i} = the ith instance of the random variable x
To see how the mean of a sample is calculated, return to the example
of jacket zippers discussed earlier. Using the information from the
sample distribution (shown again in the table on the right), the values
for the formula variables are
= sample mean
n = 100
x_{i} = 7.5, 8.0, 8.5, . . . , 16.5 (test weights)
To find the mean of the sample of zippers, add together all of the
test weights, and then divide that sum by the total number of
observations in the sample, as shown below.


The mean of this sample is a test weight of 12.04. The sample data
in this example is presented as clustered data. There are formulas to
calculate sample mean, variance, and standard deviation for clustered
sample data.
Now you know that this average test weight of your sample meets the
requirements you have established for the zippers, which is a test
weight of 10. To determine whether most of the zippers meet the
required test weight, calculate the variance and standard deviation of
the sample.
Sample Measures of Dispersion The measures of sample
dispersion, variance, and standard deviation, tell you to what degree a
distribution's values are scattered around its mean.

Learn How to Calculate the Mean for Clustered Sample Data

Sample Variance Recall that variance is a measure of the
spread of data around the mean. The formula used to calculate sample
variance also differs slightly from that used to calculate the
population variance. The variable denoting the variance of a sample s is used in place of s, the variable for population variance. Another difference in the formula for sample variance is the (n
– 1) term in the denominator. This is an adjustment used in sample
statistics to account for the fact that the sample is only a subset of
the entire population. The formula for a sample variance is shown below.
s_{x}^{2} = sample variance of the random variable x
= sample mean of the random variable x
n = number of occurrences of the random variable x in the sample
i = an instance of the random variable x
x_{i} = the ith instance of the random variable x
To find the variance of a sample, the mean of the sample is subtracted from each value of x.
This difference is then squared so as to avoid negative values. Each
squared difference is then summed and that sum is divided by n – 1. Consider the zipper example again.
s_{x}^{2} = sample variance
x_{i} = 7.5, 8.0, . . . , 16.5
= 12.04
n = 100
Using the above information, the variance of this sample of zippers would be calculated in the following manner.

You can use Excel to find sample variance and standard deviation.
Sample Variance and Standard Deviation Excel Tutorial

Sample standard deviation The standard deviation of a
sample is the square root of the sample variance. Therefore, the
formula for sample standard deviation can be written as

Learn How to Calculate the Variance for Clustered Sample Data

Because sample standard deviation is the square root of the sample
variance, the variables in the formula are the same. (These variables
were listed in the variance portion of this discussion.) The variance
of the sample of zippers was determined to be 4.83. The standard
deviation of this sample is the square root of this number.
The standard deviation of this sample is 2.20.
View the following animation for another business example using sample statistics.
Confidence Intervals You've now seen how samples can be
used to estimate the population parameters of mean, variance, and
standard deviation. In the example of selecting zippers for a hiking
jacket, recall that the design specification called for a zipper test
weight of 10; in a sample of 100 zippers, the average test weight was
determined to be 12. You might wonder how accurate your sample data are
in estimating the population of zippers. Can you be confident that the
population mean meets your design requirements of a test weight of 10
when you have taken a sample with a mean of 12? How close is the sample
mean to the true mean of the population?

Read about using statistical tools to improve productivity in this article from Chemical Week magazine. 
A common statistical method used to determine the accuracy of a
sample estimate is the construction of a confidence interval. A
confidence interval identifies a range of values around the estimate,
or sample parameter, that contains the true population parameter, given
a specified level of confidence or probability.

Learn More About Hypothesis Testing

Confidence intervals can be built around a number of parameters. The
following information focuses on constructing a confidence interval
around the mean.
The diagram below is a visual depiction of how a confidence interval
works. Here, the distribution of the population approximates a normal
distribution and the mean of the population is labeled m. The three
notations below the distribution represent different samples that have
been drawn from that population, each with a different sample mean of .
The line on either side of each of the sample means represents a
confidence interval that has been constructed around the mean of each
sample. You may notice that in each case the true population mean of m
is within the range of values contained in the confidence interval. This
means that the confidence interval has served its purpose of
identifying the range of values within which lies the true population
parameter.
You may also notice in the diagram that the confidence interval is
larger for samples 2 and 3 than it is for sample 1. The size of a
confidence interval depends on the sample parameter's sample size and
standard deviation. Note that confidence intervals are based on sample
data and, as such, it is not possible to construct an interval with 100
percent confidence of capturing the population parameter of interest.
However, it is possible to construct an interval around a sample
parameter with a high confidence level. The following formula is used
to construct a confidence interval for a population mean.

Learn About Other Types of Confidence Intervals

= the sample mean
s = sample standard deviation
n = sample size
Z = the standardized random variable for the desired level of confidence
You may recall encountering the standard random variable Z in
the Standard Normal Distribution topic of the Continuous Probability
Distributions section of this course. There, you standardized a
normally distributed random variable as a way of quickly determining
the probability associated with a range of values for that variable.
When you construct confidence intervals, you again use the standard
random variable— but reversed, since you now know the probability (or
level of confidence) you are interested in, and you are trying to
identify the value of Z associated with that probability. Many
business situations require you to aim for a level of confidence that
is 90 percent or above (a probability of .9 or greater). The following
table provides Z scores associated with common confidence levels. Also, notice that you can look up these values in the Z distribution table provided with this course.
For the purpose of calculating a confidence interval around the
sample mean in the zipper example, assume that you want to determine
the interval of the true population mean with a confidence level of 95
percent.
Using the information from the zipper example and the Z distribution table, you know that
= 12.04
s = 2.20
n = 100
Z = 1.960
Now, substitute this information into the formula to calculate the lower limit of the 95 percent confidence interval.
The lower limit of the 95 percent confidence interval is 11.61. To
find the upper limit, use the same equation, substituting an addition
sign for the subtraction sign.
The 95 percent confidence interval is (11.61, 12.47). This means
that you can be 95 percent confident that the true population mean of
zipper strength is between 11.61 pounds and 12.47 pounds. In other
words, the probability that the population mean will fall in this
interval is .95. This is good news; because the lower limit of the
confidence interval is above the design specification of a zipper test
weight of 10, you are now confident that the population of zippers will
meet your requirements.
If the sample were larger, the range of the 95 percent confidence
interval would be smaller. In general, the larger your sample size, the
more accurate the estimate of the mean becomes because you have more
data representing the population. The calculation of the confidence
interval involves division by the square root of the sample size, n.
A bigger sample size will result in a smaller confidence interval.
(There are ways to determine how big the sample size should be to
guarantee a certain level of accuracy. This is an advanced topic that
you may study during your MBA program.)

Learn More About the Central Limit Theorem

In statistical terms, if the sampling process were repeated and a
new sample mean and confidence interval were calculated each time, 95
percent of these confidence intervals would contain the true population
mean. If the zipper experiment were repeated many times, 95 percent of
the calculated confidence intervals will contain the true mean. That is
why you can be 95 percent confident that the interval (11.61, 12.47)
contains the true mean.
Take a second look at the diagram describing how sample means with
confidence intervals constructed around the mean help to estimate the
true population mean. This time a couple more samples have been
added—samples 4 and 5 with sample means of ,
respectively. Notice this time that the mean and confidence interval
constructed around sample 4 does not incorporate the population mean.
This is shown as a reminder that there will be times when the range of
the sample mean and confidence interval do not include the true
population mean when working with a 95 percent confidence interval.
1. How is the mean of a sample calculated?
Solution 1
2. Why is the mean of a sample calculated differently from the mean of a probability distribution?
Solution 2
3. How is the variance related to the standard deviation?
Solution 3
4. How is the standard deviation of a sample calculated differently than the standard deviation of a probability distribution?
Solution 4
5. The table below provides the population of daily order volumes
for a recent week. Calculate the mean, variance, and standard deviation
of this population.
Solution 5
6. Assume the data set above is not the population of orders for a
recent week. Rather, these data are a random sample of daily orders
drawn from the past year's data. Calculate the mean, variance, and
standard deviation of this sample.
Solution 6
7. The table below identifies the number of consultants of a
financial consulting firm for each of its five largest U.S. offices.
For this data set, calculate the mode, median, mean, variance, and
standard deviation.
Solution 7
8. A wellknown manufacturer of sugarless food products has invested
a great deal of time and money in developing the formula for a new kind
of sweetener. Although costly to develop, this sweetener is
significantly less expensive to produce than the sweeteners the
manufacturer had been using. The manufacturer would like to know if the
new sweetener is as good as the traditional product. The manufacturer
knows that when consumers are asked to indicate their level of
satisfaction with the traditional sweeteners, they respond that on
average their level of satisfaction is 5.5.
The manufacturer conducts market research to determine the level of
acceptance of this new product. Consumer taste acceptance data are
collected from 25 consumers of sugarless products. The data collected
can be seen in the table below.
For this sample, determine the mode, median, and mean values, as well as the variance and standard deviation.
Solution 8
9. Assume that the manufacturer of sweeteners has taken a sample of
100 consumers to determine the level of satisfaction with a new
sweetener. The results of the market research are as follows: the mean
level of satisfaction was found to be 5.75 and the standard deviation
of satisfaction was found to be 1.1.
Using this information, construct a 95 percent confidence interval for the mean level of satisfaction in the population.
Solution 9
You must complete the Preassessment before you can access the rest of this course.

Learn More about Confidence Intervals for a Population Proportion
 
