Home Site Map | Feedback | Glossary | About | Print | Help
Discrete Probability Distributions
Continuous Probability Distributions
Statistical Sampling and Regression
Populations and Samples
Statistical Estimation
Covariance and Correlation
Simple Linear Regression

PreMBA Analytical Methods
Statistical Sampling and Regression: Populations and Samples

A population is an entire set of individuals or objects, which may be finite or infinite. Examples of finite populations include the employees of a given company, the number of airplanes owned by an airline, or the potential consumers in a target market. Examples of infinite populations include the number of widgets manufactured by a company that plans to be in business forever, or the grains of sand on the beaches of the world.

For a deeper understanding of a population, consider a market researcher for a soft drink company who might want to determine the sweetness preferences of Americans between the ages of 15 and 25. The population in this example is finite and includes every American in this age group.

Obviously, gathering data from every individual in this population would be nearly impossible and prohibitively expensive. It would be more practical to collect data from a subset, or sample, of the population. If the sample is unbiased, the sample data can be used to make inferences about the population. In order for a sample to be unbiased, it must be

  • representative of the population
  • randomly selected
  • sufficiently large
  • A representative sample contains members from the population of interest. In the case of the sweetness preferences study, the sample would need to include Americans between the ages of 15 and 25. If people outside of the target age range were included, the sample would not be representative.

    In a random sample, each member of the population has an equally likely chance of being selected for the sample. Imagine that the sample data for the sweetness preferences study came exclusively from students at one university in the southern United States. This sample is not random due to the limited opportunity for the rest of the population to be involved in the study. Data from this sample would not be representative of the entire U.S. population between ages 15 and 25, because the students attending this university may have a higher or lower sweetness preference than other groups of young people. Drawing conclusions about the overall population from this sample could lead to expensive mistakes.

    A sample must also be large enough in order for its data to reflect the population. A sample that is too small may bias population estimates. When larger samples are used, data collected from idiosyncratic individuals have less influence than when smaller samples are used.

    Imagine what would happen if the sweetness preferences study collected data from a sample of three people and, based on the results from the sample, concluded that Americans between the ages of 15 and 25 favor soft drinks that are not at all sweet. Would you begin the development of a new soft drink on the basis of this sample? Of course not. A sample of three people is too small to serve as the basis for drawing conclusions about the population in general—it may be that the three individuals in the sample simply don't like sweet soft drinks.

    How many people must be included in a sample in order for it to represent the population? The optimal sample size depends on, among other things, the desired confidence level and the precision of the confidence interval. A sample size of 30 or more is often desired to ensure that the distribution of the sample mean is normal. In general, more is better. However, as sample sizes continue to increase, they yield diminishing returns. For example, consider a sample of 1,000 people versus a sample of 800. The extra precision gained in sampling 1,000 people in this example may not be worth the cost of extending the sample to another 200 people.

    1. What is the difference between information based on a sample and information based on a population?

    Solution 1

    2. What characteristics are necessary before a sample can be considered random?

    Solution 2

    3. What is the consequence of failing to have a random sample from a population?

    Solution 3

    4. A particular employment services firm collects information from candidates employed at a set of companies in order to create a normalized compensation measure. This measure provides accurate information to clients about the range of salaries, compensation packages, and bonuses that they should expect to pay their employees in order to remain solvent and competitive.

    In this case, the employment services firm collects its compensation data from the 20 highest-compensating companies in the Fortune 500. Based on the data collected from these 20 companies, the employment services firm puts together its normalized measures, and it proceeds to use these measures when working out contract details with its clients. What are some sampling issues that the employment services firm is failing to consider?

    Solution 4

    5. Two employment services firms went about collecting compensation data in an effort to establish normalized sets of information for their clients. The two firms collected compensation data from a random sample of companies that are representative of the industries within which they operate. However, the first employment services firm collected data from 55 companies, while the second firm collected data from 85 companies. Evaluate the two samples. Which would you consider to have more reliable compensation data and why?

    Solution 5

    Read about statistical research in this article from Marketing News magazine.
    Previous | Next