McGraw-Hill OnlineMcGraw-Hill Higher EducationLearning Center
Student Center | Instructor Center | Information Center | Home
Full Study Guide
Guide to Electronic Research
Internet Guide
Study Skills Primer
Statistics primer
Appendices
Learning Objectives
Chapter Overview
Fill in the Blanks
Definitions
Flashcards
Symbols and Formulas
Problems
SPSS Exercises
Self-Test
Feedback
Help Center


Thorne and Giesen Book Cover
Statistics for the Behavioral Sciences, 4/e
Michael Thorne, Mississippi State University -- Mississippi State
Martin Giesen, Mississippi State University -- Mississippi State

Correlation and Regression

Chapter Overview

Correlation is defined as the degree of relationship between two or more variables. Although there are many kinds of correlation, the chapter focuses on linear correlation, or the degree to which a straight line best describes the relationship between two variables.

The degree of linear relationship between two variables may assume an infinite range of values, but it is customary to speak of three different classes of correlation. Zero correlation is defined as no relationship between variables. Positive correlation means there is a direct relationship between the variables, such that as one variable increases, so does the other. An inverse relationship in which low values of one variable are associated with high values of the other is called negative correlation.

A scatterplot is often used to show the relationship between two variables. Scatterplots are graphs in which pairs of scores are plotted, with the scores on one variable plotted on the X axis and scores on the other variable plotted on the Y axis. On the scatterplot, a pattern of points describing a line sloping upward to the right indicates positive correlation, and points indicating a line sloping downward to the right reveal negative correlation. Zero correlation is shown by a random pattern of points on the scatterplot. High correlation between two variables doesn’t necessarily mean that one variable caused the other.

When the data are at least interval scale, the Pearson product-moment correlation coefficient, or Pearson r, is used to compute the degree of relationship between two variables. The Pearson r may be defined as the mean of the z-score products for X and Y pairs, where X stands for one variable and Y stands for the other. One approach to understanding the Pearson correlation is based on a close relative of variance, the covariance, which is the extent to which two variables vary together. Covariance can be used to derive a simple formula for the Pearson correlation, and we can think of the Pearson r as a standardized covariance between X and Y.

The range of r is from –1 to +1. Restricting the range of either the X or the Y variable lowers the value of r. The coefficient of determination, r2, tells the amount of variability in one variable explained by variability in the other variable.

After computing the Pearson r, we can test it for significance. First, we assume that our sample was taken from a population in which there is no relationship between the two variables; this is just another version of the null hypothesis. Then, we consult Table E, which contains values of r for different degrees of freedom (N – 2) with probabilities of either .05 or .01. If our computed coefficient, in absolute value, is equal to or greater than the critical value at the 5% level, we reject the null hypothesis and conclude that our sample probably came from a population in which there is a relationship between the variables.

From the definition of correlation as the degree of linear relationship between two variables, we can use the correlation coefficient to compute the equations for the straight lines best describing the relationship between the variables. The equations (one to predict X and one to predict Y) are called regression equations, and we can use them to predict a score on one variable if we know a score on the other. The general form of the equation is Y = bX + a, where b is the slope of the line and a is where the line intercepts the Y axis. The regression line is also called the least squares line.

The Spearman rank order correlation coefficient, rS, is a computationally simple alternative to r that is useful when the measurement level of one or both variables is ordinal scale. Like the Pearson r, the Spearman coefficient can be tested for significance. To test rS for significance, we compare its value with critical values in Table F for the appropriate sample size; if our computed value is larger in absolute value than the table value at the 5% level, we reject the null hypothesis and conclude that the two variables are related.

Other correlation coefficients briefly considered in the chapter are the point biserial correlation (rpbis) and the phi coefficient (φ). The former is useful when one variable is dichotomous (has only two values) and the other variable is continuous or interval level, whereas the latter is used when both variables are dichotomous. All of the inferential statistical methods covered in the text through this chapter can be tied together under the general linear model, which is a general, relationship-oriented multiple predictor approach to inference.