Excel Activity 3: Calculate a Correlation Coefficient and Create a Scatterplot

Purpose: The purpose of this activity is to learn how to use Excel to calculate a correlation coefficient and create a scatterplot.

Problem: A researcher was interested in whether there was an association between communication skills and quality of peer relationships in third grade classrooms. Teachers in each class completed a communicative skills checklist and a rating scale of peer relations for each child. The items for each scale were averaged to provide an overall score for each child.

As a general rule, the best way to start exploring data is visually (e.g., graphs, plots, etc.), then moving on to calculating some statistics to help understand how the variables are related. Using Excel, in this exercise you will create a scatterplot, calculate a correlation coefficient, and calculate a measure of effect size.

Create a Scatterplot
Calculate a Correlation Coefficient
Calculate Measure of Effect Size

 

SCATTER PLOT    (Back to Top)    (Back to Main Page)

Data Set

Select the area of data points that you want to include in the scatter plot. Place the cursor in the upper left hand cell of the first variable (B2) and click and drag across to cover cell C2 as well. Continue to hold the mouse button down and drag down to cells A201 and B201 to highlight all of the cells that contain data to be included in the plot. Select Insert from the menu bar and scroll down to Chart to open the Chart Wizard. In the box labeled Chart type select XY(Scatter) and click on the Next button to move to Step 2 of the wizard. No action is required on Step 2, so click on the Next button to move to Step 3. On the Titles panel, enter a title for the chart and label the axes. In the Chart Title field type "Scatterplot for Communicative Skills and Peer Relations." In the Value (x) axis field type "Mean Communicative Skills" and in the Value (y) axis field type "Mean Peer Relations". The preview panel on the right side of the box will show an updated version of the chart with the new labels. Click on the Legend tab and de-select the Show legend option, as a legend is not necessary for a scatterplot. Click on the Next button to move to Step 4 of the wizard. In Step 4 you must decide whether you want the chart to be inserted as its own sheet in the Excel file or as an object in the currently selected sheet. The latter is the default option and should be selected here. Then click on Finish to complete the chart wizard and have the chart inserted into the worksheet. The chart toolbar may appear in the chart (it is a horizontal panel with tools that can be used to modify the chart). If it does, click on the "X" in the upper right corner to close it. Now click on the lower right corner of the chart and drag to resize the plot and make it larger so that the points appear more spread out.

View Results

Questions

1. What information does the chart provide about the association between the two variables? What is the direction of the association? How did you determine this? Does it appear to be a small, moderate, or strong association? Why?

View Answer

2. Click anywhere on the chart to select it and then select Chart from the menu bar and scroll down to Add trendline to produce a line of best fit (also known as a regression line) for the scatterplot. The default selection linear is the most appropriate option; therefore simply click on OK to produce the trendline. How does the trendline fit with your perception of the strength and direction that you determined in Question 1? Does the presence of the trendline alter your perception at all? Why or why not?

View Answer

 

PEARSON CORRELATION    (Back to Top)    (Back to Main Page)

Data Set

Place the cursor in cell F2 to the right of the label correlation. Select Insert from the menu bar and scroll down to Function. A dialog box titled Insert Function will appear displaying an alphabetical list of functions to choose from. Under Or select a category select statistical. This will change the available functions in the list so that only functions pertaining to statistical procedures will be displayed. Scroll down to the function named PEARSON and highlight it. This is the function that calculates the Pearson product-moment correlation coefficient (also known as r). Click OK and a new dialog box titled Function Arguments will appear. In the field labeled Array1 specify the range of cells containing the values of the x variable you want to calculate the correlation for. The shorthand method for specifying a range is the label for first cell containing data, followed by a colon, then the label for the last cell containing data (e.g., A2:A23). This notation says, "I'm specifying a range of values that begins with cell A2 and ends with cell A23, and I want to include the values of all the cells in between." In the case of the current x value of interest, the cell range to specify is cell B2 through cell B201 (i.e., B2:B201). Use the same procedure to specify the y variable range in the field labeled Array2 (i.e., C2:C201). Once a valid range is specified for the second variable the results of the function will be displayed at the bottom of the dialog box, in this case .519416939. However, our goal is to have the correlation coefficient displayed on the spreadsheet. Clicking on OK will close the dialog box and insert the value in the pre-selected cell of the spreadsheet (in this case, F2).

View Results

Questions

1. Interpret the correlation coefficient. What is the strength of the association? What is the direction of the relationship? Do you believe this correlation to be small, moderate, or large? How does your interpretation fit with the one based on the scatterplot?

View Answer

2. Write a statement summarizing the results of the correlation you calculated. Be sure to include information about what the variables are as well as the direction of the association.

View Answer

 

R-SQUARED    (Back to Top)   (Back to Main Page)

Data Set

Select cell F6 to the right of the label r-squared. Select Insert from the menu bar and scroll down to Function to open the Insert Function dialog box. Be sure that the category titled Statistical is still selected and scroll down to the function called RSQ. This is the function that calculates the r-squared value. R-squared (r2) is the square of the Pearson correlation r, and is often referred to as the coefficient of determination. As a measure of effect size, r-squared provides the percentage of shared variance between two variables. This is sometimes stated as the amount of variance in one variable "accounted for" or "explained by" another variable (these terms are in quotations because they are used loosely. As correlation does not imply causation, one variable cannot actually account for or explain any variance in another when using this method. It is best to think of this measure as an index of shared variance.). Once you select the RSQ function click OK to open the Function Arguments dialog box. The fields for this function have different names then did the PEARSON function, but the data to be inputted are identical. As with PEARSON, specify the range of y values of interest in the Known y's field and the range of x values in the Known_x's field. Once a valid range is specified for the second variable the results of the function will be displayed at the bottom of the dialog box, in this case .269793956 Clicking on OK will close the dialog box and insert the value in the pre-selected cell of the spreadsheet (in this case, F6).

View Results

View All Results

Questions

1. Using a calculator, compute the square of the correlation coefficient that you calculated in the previous exercise (.519416939 in cell F2). How does this number compare to the r-squared value you computed using the RSQ function?

View Answer

2. Interpret the r-squared value. What does it indicate about the association? Write a statement about the meaning of the r-squared value in terms of the variables of interest.

View Answer

3. How does the r-squared value compare to what you saw in the scatterplot and the Pearson correlation coefficient you previously calculated? What is the difference between the information provided by the correlation coefficient and the r-squared? Which one do you feel is a more useful statistic? Why?

View Answer

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SCATTER PLOT Question 1 Answer:

The scatterplot indicates a positive association between the two variables. This can be detected by noticing that as the values for one variable increase, so do the values for the other variable. The points are reasonably close together, so it would seem that the association is moderate to strong.    (Back to Questions)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SCATTER PLOT Question 2 Answer:

The trendline supports the interpretation of a positive association, as it increases from left to right.    (Back to Questions)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PEARSON CORRELATION Question 1 Answer:

The correlation coefficient is .52, which is a moderate to strong positive association. However, to understand the relative strength of the association, one must always take into consideration the variables that are correlated and what different degrees of strengths mean in terms of those variables. The coefficient of .52 fits with the interpretation of a positive association based on the scatterplot.    (Back to Questions)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PEARSON CORRELATION Question 2 Answer:

Among third graders, communicative skills are positively correlated with quality of peer relations, r = .52.    (Back to Questions)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

R-SQUARED Question 1 Answer:

Using a calculator, the squared value of r is .269793956, which is equivalent to the value calculated using the RSQ function.    (Back to Questions)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

R-SQUARED Question 2 Answer:

The r-squared indicates the amount of shared variance between the two variables. So, communicative skills can explain about 27% of the variance in peer relations. Because the correlation is bi-directional, it can also be stated as peer relations can explain about 27% of the variance in communicative skills.    (Back to Questions)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

R-SQUARED Question 3 Answer:

The r-squared value is much lower than the correlation coefficient (about half as big). The correlation coefficient provides information about the strength and direction of the association, yet the r-squared provides information about the size of the effect. One is not necessarily more useful than the other, but they provide different very useful types of information. Consequently, they should be used in conjunction with one another.    (Back to Questions)