Obviously, gathering data from every individual in this population would be nearly impossible and prohibitively expensive. It would be more practical to collect data from a subset, or sample, of the population. If the sample is unbiased, the sample data can be used to make inferences about the population. In order for a sample to be unbiased, it must be
representative of the population
A representative sample contains members from the population of interest. In the case of the sweetness preferences study, the sample would need to include Americans between the ages of 15 and 25. If people outside of the target age range were included, the sample would not be representative.
In a random sample, each member of the population has an equally likely chance of being selected for the sample. Imagine that the sample data for the sweetness preferences study came exclusively from students at one university in the southern United States. This sample is not random due to the limited opportunity for the rest of the population to be involved in the study. Data from this sample would not be representative of the entire U.S. population between ages 15 and 25, because the students attending this university may have a higher or lower sweetness preference than other groups of young people. Drawing conclusions about the overall population from this sample could lead to expensive mistakes.
A sample must also be large enough in order for its data to reflect the population. A sample that is too small may bias population estimates. When larger samples are used, data collected from idiosyncratic individuals have less influence than when smaller samples are used.
Imagine what would happen if the sweetness preferences study collected data from a sample of three people and, based on the results from the sample, concluded that Americans between the ages of 15 and 25 favor soft drinks that are not at all sweet. Would you begin the development of a new soft drink on the basis of this sample? Of course not. A sample of three people is too small to serve as the basis for drawing conclusions about the population in generalit may be that the three individuals in the sample simply don't like sweet soft drinks.
How many people must be included in a sample in order for it to represent the population? The optimal sample size depends on, among other things, the desired confidence level and the precision of the confidence interval. A sample size of 30 or more is often desired to ensure that the distribution of the sample mean is normal. In general, more is better. However, as sample sizes continue to increase, they yield diminishing returns. For example, consider a sample of 1,000 people versus a sample of 800. The extra precision gained in sampling 1,000 people in this example may not be worth the cost of extending the sample to another 200 people.
1. What is the difference between information based on a sample and information based on a population?
2. What characteristics are necessary before a sample can be considered random?
3. What is the consequence of failing to have a random sample from a population?
4. A particular employment services firm collects information from candidates employed at a set of companies in order to create a normalized compensation measure. This measure provides accurate information to clients about the range of salaries, compensation packages, and bonuses that they should expect to pay their employees in order to remain solvent and competitive.
In this case, the employment services firm collects its compensation data from the 20 highest-compensating companies in the Fortune 500. Based on the data collected from these 20 companies, the employment services firm puts together its normalized measures, and it proceeds to use these measures when working out contract details with its clients. What are some sampling issues that the employment services firm is failing to consider?
5. Two employment services firms went about collecting compensation data in an effort to establish normalized sets of information for their clients. The two firms collected compensation data from a random sample of companies that are representative of the industries within which they operate. However, the first employment services firm collected data from 55 companies, while the second firm collected data from 85 companies. Evaluate the two samples. Which would you consider to have more reliable compensation data and why?