Home Site Map | Feedback | Glossary | About | Print | Help
Discrete Probability Distributions
Continuous Probability Distributions
Statistical Sampling and Regression
Populations and Samples
Statistical Estimation
Covariance and Correlation
Simple Linear Regression

PreMBA Analytical Methods
Statistical Sampling and Regression: t-Distribution

In Statistical Estimation, you were shown how to determine a confidence interval around the sample mean as an estimate of the true population mean, which was unknown. The formula below is used for constructing a confidence interval for a population mean using the Z score associated with your desired level of confidence.

A Z distribution is a standard normal distribution, and it can be used to construct confidence intervals in situations where the sample size is large. Use of the Z distribution (or Z score) is appropriate when the data approximate a normal distribution, which is the case in large samples.

However, what happens when the sample size is small—say, less than or equal to 30? It is easy to see that the confidence interval will become larger the smaller n becomes for a given level of confidence. Furthermore, with a sample size of less than 30, the sample data no longer approximate a normal distribution, which makes the use of a Z value in the calculation of the confidence interval inappropriate. In such cases, a t-distribution must be used.

The t-distribution has parameters m = 0 (again, like the Z distribution) and a standard deviation of

where n is called the degrees of freedom and n = n – 1. The following picture shows the shape of the t-distribution in comparison to the standard normal (or Z) distribution. Notice that the t-distribution becomes flatter with a smaller value of n.

How do t-scores compare to Z scores for a given level of confidence? From the graph, you can see that tails of the t-distribution are thicker and extend out further than those of the Z distribution. This indicates that for a given confidence level, t-scores are larger than Z scores. As n increases in size, the shape of the t-distribution begins to resemble a normal distribution and the t-scores become smaller. As n approaches 30, the t-score associated with a given confidence level approaches the Z score for that confidence level.

You can look up the t-value for a desired level of confidence in a t-distribution table, given the degrees of freedom (n – 1) associated with your sample. The table can be accessed by clicking on the item labeled "t-distribution table" in the margin. Guidelines on how to read a t-distribution table are also available in the margin.

t-Distribution Table

To review an example of how the t-value changes with sample size, consider a sample of size of n = 12
(n = 11). For a 95 percent confidence interval, the t-value (or t-score) is 2.201. Recall that the Z score for the same level of confidence is 1.96.

If n increases, say to 25 (n = 24), then the t-score becomes 2.060. Using the formula below for constructing a confidence interval using a t-score, you can see that as a result of this increase in sample size, not only does the confidence interval around the sample mean () become smaller as a result of the smaller t-score, but its size is also affected by the larger value of n in the denominator.

Learn How to Read a t-Distribution Table

The contrary also holds in that a very small sample size will result in a confidence interval that is quite large, because both the t-value for a given confidence level is larger for a smaller sample size and the small n in the denominator of the calculation does not reduce the interval size significantly. The larger interval for a desired level of confidence is required to account for the additional uncertainty introduced by the small sample.

At this point, apply your knowledge and develop a confidence interval using a t- distribution. Consider a consumer safety group that performs crash tests on vehicles to determine safety ratings. Crashing cars is very expensive since the damaged cars have little or no salvage value. This consequence calls for a small sample size, since the budget for cars to crash is limited.

Suppose a particular auto manufacturer has produced a new model vehicle. The consumer safety group needs to determine a safety rating and calculate how accurate their estimate is based on the small sample size. After choosing a random sample of five of the new model cars from the manufacturer, the test is performed and the estimate of the safety rating is determined to be eight with a sample standard deviation of .94. The consumer safety group wants to have a high degree of confidence in its estimate and so it chooses to calculate a 99 percent confidence interval. The t-score for 99 percent and n = 4 is found in the t-distribution table to be 4.604. Using the formula, the confidence interval is calculated as follows.

This result means that the consumer safety group can say that it is 99 percent confident the safety rating of the new model is between 6.06 and 9.94. To get a smaller confidence interval at the 99 percent confidence level, more cars will have to be crash tested.

1. How does a t-distribution differ from a Z distribution?

Solution 1

2. When does a t-distribution begin to approximate a Z distribution?

Solution 2

3. What is a 95 percent confidence interval?

Solution 3

4. How is a 95 percent confidence interval for a population mean calculated?

Solution 4

5. What is the t-score for a 95 percent confidence interval for m in a sample of a random variable having a standard normal distribution of 16 data points?

Solution 5

6. A cleaning business operates in the city of New York and works for the companies that lease office space in the city. The business contracts to clean office space in increments of 100 square feet. The business determines its margins by determining how long it takes a crew to clean 100 square feet of office space, and bases its rates on this information.

Because the company is relatively new, it has to estimate the time it takes to clean a 100 square feet of office space. The company estimates that it should take 5.5 hours to clean 100 square feet (m = 5.5). The company starts its business with this expectation and works for a week straight, collecting data as it proceeds in order to be certain that it is neither over- nor under-charging its clients. The data collected by the company can be seen in the table below.

After collecting this data, the company wants to determine if the time originally estimated to clean 100 square feet of office space was reasonable. Assuming a 95 percent confidence level, was this estimation accurate?

Solution 6

7. If the cleaning company from the previous question had a sample of seven rather than a sample of 12 upon which to base its conclusions, what would be the boundaries of the 95 percent confidence interval for the estimate of the number of hours? Assume that the sample mean and standard deviation are equal to those calculated above.

Solution 7

Learn About Confidence Intervals for Small Sample Sizes
Previous | Next