Home Site Map | Feedback | Glossary | About | Print | Help
Discrete Probability Distributions
Continuous Probability Distributions
Statistical Sampling and Regression
Populations and Samples
Statistical Estimation
Covariance and Correlation
Simple Linear Regression

PreMBA Analytical Methods
Statistical Sampling and Regression: Simple Linear Regression

When you think of regression, think prediction. A regression uses the historical relationship between an independent and a dependent variable to predict the future values of the dependent variable. Businesses use regression to predict such things as future sales, stock prices, currency exchange rates, and productivity gains resulting from a training program.

Types of Regression

A regression models the past relationship between variables to predict their future behavior. As an example, imagine that your company wants to understand how past advertising expenditures have related to sales in order to make future decisions about advertising. The dependent variable in this instance is sales and the independent variable is advertising expenditures.

When does a vice president of sales use regression?

Low bandwith

High bandwith

Usually, more than one independent variable influences the dependent variable. You can imagine in the above example that sales are influenced by advertising as well as other factors, such as the number of sales representatives and the commission percentage paid to sales representatives. When one independent variable is used in a regression, it is called a simple regression; when two or more independent variables are used, it is called a multiple regression.

Regression models can be either linear or nonlinear. A linear model assumes the relationships between variables are straight-line relationships, while a nonlinear model assumes the relationships between variables are represented by curved lines. In business you will often see the relationship between the return of an individual stock and the returns of the market modeled as a linear relationship, while the relationship between the price of an item and the demand for it is often modeled as a nonlinear relationship.

As you can see, there are several different classes of regression procedures, with each having varying degrees of complexity and explanatory power. The most basic type of regression is that of simple linear regression. A simple linear regression uses only one independent variable, and it describes the relationship between the independent variable and dependent variable as a straight line. This review will focus on the basic case of a simple linear regression.

How does regression work to enable prediction? View the following animation for a brief explanation of the basics of simple linear regression. The subsequent text will develop ideas mentioned in the animation.

View animation

Scatter Plots

As indicated by the animation, one of the first steps in regression is to plot your data on a scatter plot. The following table lists the monthly sales and advertising expenditures for all of last year by a digital electronics company.

In this case, you would plot last year's data for monthly sales and advertising expenditures as shown on the scatter plot below. (Data for independent and dependent variables must be from the same period of time.)

Scatter plots are effective in visually identifying relationships between variables. These relationships can be expressed mathematically in terms of a correlation coefficient, which is commonly referred to as a correlation. Correlations are indices of the strength of the relationship between two variables. They can be any value from –1 to +1. (Correlations are covered in greater detail in the Covariance and Correlation topic of this section.)

When you use regression to predict future values of the dependent variable, the ideal correlation between the independent and dependent variable is high—in absolute value terms, somewhere in the range between .5 to .99. Viewing the scatter plot above, you can see that there appears to be some degree of correlation between the level of advertising expenditure and product awareness. When calculated, this correlation equals .89. This historical data will enable you to predict the relationship between the two variables in the future, before any further expense is incurred. In order to make these predictions, a regression line must be drawn from the information appearing in the scatter plot.

Regression Line

The figure below is the same as the scatter plot above, with the addition of a regression line fitted to the historical data.

The regression line is the line with the smallest possible set of distances between itself and each data point. As you can see, the regression line touches some data points, but not others. The distances of the data points from the regression line are called error terms.

A regression line will always contain error terms because, in reality, independent variables are never perfect predictors of the dependent variables. There are many uncontrollable factors in the business world. The error term exists because a regression model can never include all possible variables; some predictive capacity will always be absent, particularly in simple regression.

The typical procedure for finding the line of best fit is called the least-squares method. This calculation is usually performed using computer software. In this calculation, the best fit is found by taking the difference between each data point and the line, squaring each difference, and adding the values together. The least-squares method is based upon the principle that the sum of the squared errors should be made as small as possible so the regression line has the least error.

Once this line is determined, it can be extended beyond the historical data to predict future levels of product awareness, given a particular level of advertising expenditure.

The extension of the line of regression requires the assumption that the underlying process causing the relationship beween the two variables is valid beyond the range of the sample data. Regression is a powerful business tool due to its ability to predict future relationships between variables such as these.

When you run a regression in Excel or in a statistics program, the program will provide you with a report. The details of these reports, and the definition of all the terms included in the report, are beyond the scope of the course.

Equation of a Regression Line

You may recall the equation of a straight line from your review of the Linear Functions topic in the Algebra section of this course.

Variables, constants, and coefficients are represented in the equation of a line as

  • x represents the independent variable
  • f(x) represents the dependent variable
  • the constant b denotes the y-intercept—this will be the value of the dependent variable if the independent variable is equal to zero
  • the coefficient m describes the movement in the dependent variable as a result of a given movement in the independent variable
  • In finance, linear regressions are commonly used to describe the returns of an individual security (dependent variable) compared to the returns of the market in general (independent variable). The equation for the simple linear regressions used to describe security movements is also a straight line and is expressed in a format, which, while similar, does contain a couple of twists. The equation below is a regression equation for a straight line describing the relationship between the returns of security I and the market in general.

  • ri represents the return of security I and is the dependent variable
  • rm represents the return of the market in general and is the independent variable
  • b is the slope of the regression line, and it describes the level of movement in security I as a result of a unit of movement in the market in general
  • a is the y-intercept of the regression line
  • I is an error term that describes the distance between an actual data point and the corresponding point on the regression line
  • The graph below provides a visual depiction of this regression line. The returns of the market in general are represented in this graph by the returns of the S&P 500—a common surrogate for market returns.

    You may be familiar with discussions in financial circles about the beta (b) of a security being a measure of the security's risk. The risk measure of beta is calculated using regression techniques.

    Beta, the slope of the regression line, was described above as the level of movement in the returns of a given security for each unit of movement in the market in general. A security with a high beta is considered risky and will experience big swings in its returns as compared to those of the market. A security with a low beta is considered less risky and will have returns that fluctuate less than those of the market. The alpha term (a) in the regression equation of a security represents the security's propensity to move independent of the market. The alpha and beta of a security cannot be observed directly but are estimated, based on the past performance of a security, through regression analysis.

    1. The placement office of a graduate business school would like to predict the starting salaries of its students. The placement office administrators are highly confident (based on their collective past experience) that starting salaries depend on a combination of factors, including the number of years of previous work experience, the student's graduate school GPA, and the student's GMAT score. Is it appropriate for the placement office to use a simple linear regression to predict the starting salaries of its students? Why or why not?

    Solution 1

    2. The following two graphs illustrate simple linear regressions. Which has a higher predictive quality and why?

    Solution 2

    3. Consider the following scatter plot with a regression line of the advertising dollars spent (in thousands) and sales (in millions). If the company were to spend $275,000 on advertising, what would you predict the sales level to be?

    Solution 3

    Read about an application of linear regression in this article from the Journal of Financial Planning.
    Previous | Next