Site MapHelpFeedbackKey Terms
Key Terms
(See related pages)


Exploratory data analysis (EDA)  Examining data for potentially important patterns and relationships, especially through the use of simple graphical techniques and numerical summaries.
Dummy code  In a data file, numbers used to stand for category values; for example, 0 = male, 1 = female.
Bar graph  A graph on which data from groups of subjects are represented by bars of differing heights tied to the value of the dependent variable for the group.
Line graph  A graph on which data relating the variables are plotted as points connected by lines.
Scatterplot  A plot used to display correlational data from two measures. Each point represents the two scores provided by each subject, one for each measure, plotted against one another.
Pie chart  Type of graph in which a circle is divided into segments. Each segment represents the proportion or percentage of responses falling in a given category of the dependent variable.
Frequency distribution  A graph or table displaying a set of values or a range of values of a variable, together with the frequency of each.
Histogram  A graph depicting a frequency distribution in which the frequencies of class intervals are represented by adjacent bars along the scale of measurement.
Stemplot  A graphical display of a distribution of scores consisting of a column of values (the stems) representing the leftmost digit or digits of the scores and, aligned with each steam, a row of values representing the rightmost digit of each score having that particular stem value.
Skewed distribution  A frequency distribution in which most scores fall into categories above or below the middle category.
Normal distribution  A specific type of frequency distribution in which most scores fall around the middle category. Scores become less frequent as your move from the middle category. Also referred to as a bell-shaped curve.
Outliers  Values of a variable in a set of data that lie far from the other values.
Resistant measures  Statistics that are not strong affected by the presence of outliers or skewness in the data.
Measure of center  A single score, computed from a data set, that represents the general magnitude of the scores in the distribution.
Mode  The most frequent score in a distribution. The least informative measure of center.
Median  The middle score in an ordered distribution.
Mean  The arithmetic average of the scores in a distribution. The most frequently reported measure of central tendency.
Measure of spread  A single score, computed from a data set, that represents the amount of variability of the scores in the distribution (i.e., how spread out they are).
Range  The least informative measure of spread; the difference between the lowest and highest scores in a distribution.
Interquartile range  A measure of spread in which an ordered distribution of scores is divided into four groups. The score separating the lower 25 percent is subtracted from the score separating the upper 25 percent. The resulting difference is divided by 2.
Variance  A measure of spread. The averaged square deviation from the mean.
Standard deviation  The most frequently reported measure of spread. The square root of the variance.
Five-number summary  A set of five numbers used to summarize the characteristics of a distribution: the minimum, first quartile, median, third quartile, and maximum.
Boxplot  A graphical display of the values of the five-number summary of a distribution.
Pearson product—moment correlation coefficient, or Pearson r  The most popular measure of correlation. Indicates the magnitude and direction of a correlational relationship between variables.
Point-biserial correlation  A variation of the Pearson correlation used when one variable can take on only two values.
Spearman rank order correlation (rho)  A measure of correlation used when variables are measured on at least an ordinal scale.
Phi coefficient  Measure of correlations used when both variables can take on only two values.
Linear regression  Statistical technique used to determine the straight line that best fits a set of data.
Bivariate linear regression  A statistical technique for fitting a straight line to a set of data points representing the paired values of two variables.
Least squares regression line  Straight line, fit to data, that minimizes the sum of the squared distances between each data point and the line.
Regression weight  Value computed in a linear regression analysis that provides the slope of the least squares regression line. See also beta weight.
Standard error of estimate  A measure of the accuracy of prediction in a liner regression analysis. It is a measure of the distance between the observed data points and the least squares regression line.
Coefficient of nondetermination  Statistic indicating the proportion of variance in one variable not accounted for by variation in a second variable.
Correlation matrix  A matrix giving the set of all possible bivariate correlations among three or more variables.







Research Design and MethodsOnline Learning Center

Home > Chapter 12 > Key Terms