Introduction Representation of Data Descriptive Statistics Correlation Statistics Inferential Statistics Summary### IntroductionTo understand God's thoughts we must study statistics; for these are the measure of his purpose.
(Florence Nightingale, 1820-1910)
Sociological research can have three distinct goals: *description, explanation,*
and* prediction*. Description is always an important part of research,
but most sociologists attempt to explain and predict what they observe. The
three research methods most commonly used by sociologists are observational
techniques, surveys, and experiments. In each case, measurement is involved that yields a set of numbers, which are the findings, or data, produced by the research study. Sociologists and other scientists
summarize data, find relationships between sets of data, and determine
whether experimental manipulations have had an effect on some
variable of interest.
The word *statistics* has two meanings: (1) the field that applies mathematical
techniques to the organizing, summarizing, and interpreting of data, and (2)
the actual mathematical techniques themselves. Knowledge of statistics has many
practical benefits. Even a rudimentary knowledge of statistics will make you
better able to evaluate statistical claims made by science reporters, weather
forecasters, television advertisers, political candidates, government officials,
and other persons who may use statistics in the information or arguments they
present. [Return to Top] ### Representation of DataBecause a list of raw data may be difficult to interpret, sociologists
prefer to represent their data in an organized way. Two of the most
common ways are frequency distributions and graphs.
Frequency Distributions Suppose that you had a set of 20 scores from a 100-point sociology
exam. You might arrange them in a frequency distribution, listing the
frequency of each score or group of scores in a set of scores. Using
the set of scores in Table B.1, you would set up a column including the
highest and lowest scores, as well as the possible scores in between. In
this case, the highest score is 94 and the lowest is 80. You would then
count the frequency of each score and list it in a separate column. The
total of the frequencies in the distribution is symbolized by the letter N.
The frequency distribution might show a pattern in the set of scores
that is not apparent when simply examining the individual scores. In this
example (presented in Table B.1), the exam scores do not bunch up
toward the lower, middle, or upper portions of the distribution. In
some cases, typically when the difference between the highest score
and the lowest score is greater than 15, you might prefer to use a
grouped frequency distribution. The scores are grouped into intervals,
and the frequency of scores in each interval is listed in a separate
column. The intervals can be of any size, but, for ease of construction,
a grouped frequency distribution should end up with no more than
about 10 groups. A grouped frequency distribution provides less
precise information than does an ungrouped one, because the
individual scores are lost. However, the benefit of a grouped frequency
distribution is that one can understand any trends in the data at a quick
glance.
Learning Check #1: Suppose we ask 23 students how many music
CDs they own. Present the following data in a frequency
distribution: 43, 15, 52, 24, 84, 36, 75, 70, 98, 44, 56, 60, 48, 41,
38, 7, 62, 49, 32, 71, 25, 46, 58.
Click here for Answer Graphs
If a picture is worth a thousand words, then a graph is worth several paragraphs
in a research report. Because it provides a pictorial representation of the
distribution of scores, a graph can be an even more effective representation
of research data than a frequency distribution. Among the most common kinds
of graphs are pie graphs, frequency histograms, frequency polygons, and line
graphs.
Pie Graph
A simple, but visually effective, way of representing data is the pie graph.
It represents data as percentages of a pie-shaped graph. The total of the slices
of the pie must add up to 100 percent.
Learning Check #2: Suppose in a class of 150 students there are 13 First-year
students, 68 Sophomores, 50 Juniors, and 19 Seniors. Construct a pie chart to
illustrate this data.
Click here for Answer Frequency Polygon
A frequency polygon serves the same purpose as a frequency histogram. As shown
in Figure B.2, the frequency polygon is drawn by connecting the points, representing
frequencies, located above the scores. Note that the polygon is completed by
extending it to the abscissa one score below the lowest score and one score
above the highest score in the distribution.
An advantage of the frequency polygon over the frequency histogram is that
it permits the plotting of more than one distribution on the same set of axes.
Plotting more than one frequency histogram on a set of axes would create a confusing
graph. If more than one frequency polygon is plotted on a set of axes, they
should be distinguished from one another. This can be done by drawing a different
kind of line for each polygon (perhaps a solid line for one and a broken line
for the other), drawing the lines in different colors (perhaps red for one polygon
and blue for the other), or representing the points above the scores with geometric
shapes (perhaps a circle for one polygon and a triangle for the other).
There are a few shapes that a frequency polygon can take that are particularly
interesting to sociologists and other social researchers. A graph in which scores
bunch up toward either end of the abscissa (as shown in Figure B.3) is said
to be skewed. The skewness of a graph is in the direction of its "tail." If
the scores bunch up toward the high end, the graph has a negative skew. If the
scores bunch up toward the low end, the graph has a positive skew. A distribution
is said to be normal (or bell-shaped) if the scores bunch up in the middle and
then taper off fairly equally on each side. Finally, a distribution is called
a rectangular distribution if the scores are fairly evenly distributed throughout
the graph.
Learning Check #3: Remember the 23 students who reported the number of music
CDs that they own? Present the following data in a frequency polygon: 43, 15,
52, 24, 84, 36, 75, 70, 98, 44, 56, 60, 48, 41, 38, 7, 62, 49, 32, 71, 25, 46,
58.
Learning Check #4: What shape is the distribution graphed in?
Learning Check #5: What would have made the distribution take on a positive
skew? A negative skew? A rectangular shape?
Click here for Answer Line Graph
Whereas pie graphs, frequency histograms, and frequency polygons are useful
for plotting frequency data, a line graph is useful for plotting data generated
by experimental social research. It uses lines to represent the relationship
between independent variables and dependent variables. If you skim through your
introductory sociology textbook, you will see several examples of line graphs.
The graph shown in Figure B.4 represents the data from an investigation of the
relationship between exercise and weight loss. Note in this figure that one
line represents a group of people who agree to exercise regularly and the other
line represents a group of people who do not engage in exercise. In all other
ways these two groups are equal. They are weighed one week after agreeing to
participate in the study and again two weeks after agreeing to participate.
Note that this graph allows the reader to note quickly the benefits of exercise
on weight loss. [Return to Top] ### Descriptive Statistics
Suppose you gained access to the hundreds, or thousands, of high school
grade point averages of all the freshmen at your college or university.
What is the most typical score? How similar are the scores? Simply
scanning the scores would provide, at best, gross approximations of the
answers to these questions. To obtain precise answers, sociologists use
descriptive statistics, which include measures of central tendency and
measures of variability.
Measures of Central Tendency
A measure of central tendency is a single score that best represents an
entire set of scores. The measures of central tendency include the mode,
the median, and the mean.
Mode
The mode is the most frequently occurring score in a set of scores. In the
frequency distribution of exam scores discussed, the mode is 90. If two
scores occur equally often, the distribution is bimodal. If the data set is
made up of a counting of categories, then the category with the most
cases is considered the mode. For example, in determining the most
common academic major at your school, the mode is the major with the
most students. The winner of a presidential primary election in which there
are several candidates would represent the mode--the person selected by
more voters than any other.
The mode can be the best measure of central tendency for practical
reasons. Imagine a car dealership given the option of carrying a particular
model, but limited to selecting just one color. The dealership owner would
be wise to choose the modal color.
Learning Check #5: A researcher is interested in the effect of family
size on self-esteem. To begin this study, 10 students are each asked
how many brothers and sisters they have. The responses are as
follows: 2, 3, 1, 0, 9, 2, 3, 2, 4, 2. What is the mode for this set of
data?
Click here for Answer
Mean
The mean is the arithmetic average, or simply the average, of a set of
scores. You are probably more familiar with it than any other measure of
central tendency. You encounter the mean in everyday life whenever you
calculate your exam average, batting average, gas mileage average, or a
host of other averages.
The mean of a sample is calculated by adding all the scores and dividing
by the number of scores.
Exam Scores: 99, 92, 93, 94, 97
Learning Check #6: What is the mean number of brothers and sisters
listed in Learning Check #5?
Click here for Answer
Median
The median is the middle score in a distribution of scores that have been
ranked in numerical order. If the median is located between two scores, it
is assigned the value of the midpoint between them (for example, the
median of 23, 34, 55, and 68 would equal 44.5). The median is the best
measure of central tendency for skewed distributions, because it is
unaffected by extreme scores. Note that in the example below the median
is the same in both sets of exam scores, even though the second set
contains an extreme score. The mean is quite different, due to the one
extreme score on Exam B.
Exam A: 23, 25, 63, 64, 67
Exam B: 23, 25, 63, 64, 98
When Disraeli pointed out the ease of lying with statistics, he might have
been referring, in particular, to measures of central tendency. Suppose a
baseball general manager is negotiating with an agent about a salary for a
baseball catcher of average ability. Both might use a measure of central
tendency to prove their own points, perhaps based on the salaries of the
top seven catchers, as shown in Table B.2. The general manager might
claim that a salary of $340,000 (the median) would provide the player
with what he deserves, based on an average salary of the other players.
The agent might counter that a salary of $900,000 (the mean) would
provide the player with what he deserves, based on an average salary of
the other players. Note that neither would technically be lying: they would
simply be using statistics that favored their position. As Scottish writer
Andrew Lang (1844-1912) warned, beware of anyone who "uses
statistics as a drunken man uses lampposts--for support rather than for
illumination."
Learning Check #7: What is the median number of brothers and
sisters listed in Learning Check #5?
Click here for Answer
Learning Check #8: Note that the mean number of brothers and
sisters is quite a bit different than the median number of brothers
and sisters. In this case, which measure of central tendency would be
most appropriate to report? Why?
Click here for Answer
Measures of Variability
Although a measure of central tendency is certainly important, it does not
completely represent a distribution by itself. Given a measure of central
tendency, you have an idea of where scores tend to fall, but you don’t
know to what extent the scores differ from one another. A measure of the
amount of dispersion contained within a data set is called a measure of
variability. Except when all scores in a data set are identical, all sets of
scores vary to some degree. Consider the members of your sociology
class. They would vary on a host of measures, including height, weight,
and grade point average. Measures of variability include the range, the
variance, and the standard deviation.
Range
The range is the difference between the highest and lowest scores in a
distribution. The range provides limited information, because distributions
in which scores bunch up toward the beginning, middle, or end of the
distribution might have the same range. Of course the range is useful as a
rough estimate of how a score compares with the highest and lowest in a
distribution. For example, a student might find it useful to know whether
he or she did near the best or the worst on an exam. The range of scores
in the distribution of 20 grades in the earlier example in Table B.1 would
be the difference between 94 and 80, or 14.
Learning Check #9: A social researcher would like to know how
many digits people in different age categories can recall with only one presentation of a list. She creates random lists of digits and presents them to participants.
The number of digits recalled by the first 10 participants is as
follows: 5, 9, 6, 10, 9, 7, 8, 7, 9, 12. What is the range of this data
set?
Click here for Answer
Variance
A more informative measure of variability is the variance, which
represents the variability of scores around their group mean. Unlike the
range, the variance takes into account every score in the distribution.
Technically, the variance is the average of the squared deviations from the
mean.
Suppose you wanted to calculate the variance for the sets of 10-point
quiz scores in Quiz A and Quiz B (Table B.3). First, find the group mean.
Second, find the deviation of each score from the group mean. Note that
deviation scores will be negative for scores that are below the mean. As a
check on your calculations, the sum of the deviation scores should equal
zero. Third, square the deviation scores. By squaring the scores, negative
scores are made positive and extreme scores are given relatively more
weight. Fourth, find the sum of the squared deviation scores. Fifth, divide
the sum by the number of scores. This yields the variance. Note that the
variance for Quiz A is larger than that for Quiz B, indicating the students
were more varied in their performances on Quiz A.
Standard Deviation
The standard deviation, or *S*, is the square root of the variance. The
standard deviation of Quiz A would be
*S* = 3.19.
The standard deviation of Quiz B would be
*S* = 1.414.
Why not simply use the variance? One reason is that, unlike the variance,
the standard deviation is in the same units as the raw scores. This makes
the standard deviation more meaningful. Thus, it would make more sense
to discuss the variability of a set of IQ scores in IQ points than in squared
IQ points. The standard deviation is used in the calculation of many other
statistics.
Learning Check #10: The exam scores for two sections of
introductory sociology are listed below. Compute the standard
deviation for each section. Section #1: 42, 45, 56, 56, 60, 62, 67, 68,
70, 71. Section #2: 57, 57, 57, 70, 75, 77, 79, 83, 83, 92.
Click here for Answer
Learning Check #11: Suppose that there were two groups that
discussed issues related to abortion. Each member of each group
rated on a scale of 1 to 10 their opinion regarding abortion (1 =
Totally against abortion; 5 = Neutral; 10 = Totally in favor of
abortion). The mean for Group A was found to be 5 with a standard
deviation of .02. For Group B the mean was also 5, but the standard
deviation was 3.42. Which group would have the more lively
debates?
Click here for Answer
The Normal Curve and Percentiles
As illustrated in Figure B.5, the normal curve is a bell-shaped graph that
represents a hypothetical frequency distribution in which the frequency of
scores is greatest near the mean and progressively decreases toward the
extremes. In essence, the normal curve is a smooth frequency polygon
based on an infinite number of scores. The mean, median, and mode of a
normal curve are the same. Many variable human
characteristics, such as height, weight, and intelligence, fall on a normal
curve.
One useful characteristic of a normal curve is that certain percentages of
scores fall at certain distances (measured in standard deviation units) from
its mean. A special statistical table makes it a simple matter to determine
the percentage of scores that fall above or below a particular score or
between two scores on the curve. For example, about 68 percent of
scores fall between plus and minus one standard deviation from the mean;
about 95 percent fall between plus and minus two standard deviations
from the mean; and about 99 percent fall between plus and minus three
standard deviations from the mean.
For example, consider an aptitude test, with a mean of 100 and a standard
deviation of 15. What percentage of people score above 115? Because
aptitude scores fall on a normal curve, about 34 percent of the scores
fall between the mean and one standard deviation (in this case 15 points)
above the mean. We also know that for a normal distribution 50 percent
of the scores fall above the mean and 50 percent fall below the mean.
Thus, about 84 percent (50 percent below the mean and 34 percent
between the mean and a score of 115) of the scores fall below 115. If 84
percent fall below 115, then 16 percent (100 percent minus 84 percent) must
fall above a score of 115.
Learning Check #12: An introductory sociology teacher who has
taught for years has developed a comprehensive final exam that is
normally distributed with a mean of 200 points and a standard
deviation of 25 points. (a) What percentage of the students score
above 200 points? (b) What percentage of the students score below
175 points? (c) What percentage of the students score more than 250
points?
Click here for Answer
Scores along the abscissa of the normal curve also represent
percentiles--the scores at or below which particular percentages of
scores fall. Percentiles are frequently used, as they give us a quick idea of
how a score compares with the rest of the data set. If a score is equal to the
10th percentile, then you know that 10 percent of the scores fell at or
below that value and 90 percent of the scores were above that value.
With respect to IQ scores, a score of 115 would have a percentile rank
of 84.
Learning Check #13: What are the percentile ranks for the three
scores listed in Learning Check #12: 200, 175, and 250?
Click here for Answer
Learning Check #14: Suppose you take your daughter Emily to the
doctor’s office for a well-check and find out that she is in the 5th
percentile for height and 7th percentile for weight. What do you now
know about Emily, as compared with other children her age?
Click here for Answer [Return to Top] ### Correlational Statistics
So far, you have been reading about statistics that describe sets of
data. In many research studies, sociologists might want to know the
extent to which two variables are related. Correlational statistics do
just that. Correlational statistics yield a number called the coefficient
of correlation. The coefficient may vary from 0.00 to 1.00.
Correlations may also be either positive or negative. In a positive
correlation, scores on two different variables increase and decrease
together. For example, there is a positive correlation between high
school average and freshmen grade point average in college. In a
negative correlation, as scores for one variable decrease, they increase
for the other variable. For example, there is a negative correlation
between absenteeism and course performance. The strength of a
correlation depends on its size, not its sign. For example, a correlation
of -.72 is stronger than a correlation of +.53.
Correlational statistics are important because they permit us to
determine the strength and direction of the relationship between
different sets of data or to predict scores on one distribution based on
our knowledge of scores on another. If the correlation between two
sets of data were a perfect 1.00, we could predict one score from
another with complete accuracy. But because correlations are almost
always less than perfect, we predict one score from another only with
a particular probability of being correct--the higher the correlation, the
higher the probability.
It cannot be stressed strongly enough that correlation does not mean
causation. For example, years ago, authorities presumed that autistic
children, who have poor social and communication skills, were caused
by "refrigerator mothers." Mothers of autistic children were aloof from
them. This was taken as a sign that the children suffered from mothers
who were emotionally cold. Knowing that this is simply a correlation,
you might wonder whether causality was in the opposite direction.
Perhaps autistic children, who do not respond to their mothers, cause
their mothers to become aloof from them. Moreover, why would a
mother have several normal children, then an autistic child, and then
several more normal ones? It would be difficult to believe she was a
warm parent to all but one. Today, evidence indicates that autism is a
neurological problem that has nothing to do with the mother's
emotionality.
As another example, although there is a positive correlation between
smoking and cancer in human beings, this correlation is not
scientifically acceptable evidence that smoking causes cancer. Perhaps
another factor (such as a level of stress tolerance) might make
someone prone to both smoking and cancer, without smoking's
necessarily causing cancer. Of course, correlation does not imply the
absence of causation. For example, there may indeed be a causal
relationship between smoking and cancer. The point is that if two
variables are strongly correlated, one of the variables may cause the
other, or there may not be a causal link: we just cannot tell for sure
based on a correlation coefficient. But remember that knowing that
two variables are related is still an important piece of information.
Learning Check #15: Many studies have determined that there is
a positive correlation between viewing violence on television and
violent behavioral patterns. What does this mean?
Click here for Answer
Learning Check #16: Given that there is a positive correlation
between viewing violence on television and violent behavior, can
we conclude from this data that watching the violence on
television causes children to behave violently?
Click here for Answer
Learning Check #17: Researchers used to believe that there was a
negative correlation between age and IQ. Recently, this
correlation has turned out to be much weaker than we originally
thought. Describe what is meant by a negative correlation
between age and IQ.
Click here for Answer
Scatter Plots
Correlational data are graphed using a scatter plot, also known as a
scattergram or scatter diagram. In a scatter plot, one variable is plotted
on the abscissa and the other on the ordinate. Each participant's
scores on both variables are indicated by a dot placed at the junction
between those scores on the graph. This produces one dot for each
participant. The pattern of the dots gives a rough impression of the size
and direction of the correlation. In fact, a line drawn through the dots,
or line of best fit, helps estimate this. The closer the dots lie to a
straight line, the stronger the correlation. Figure B.6 illustrates several
kinds of correlation.
Pearson's Product-Moment Correlation
The most commonly used coefficient of correlation is the Pearson's
product-moment correlation (Pearson's r), named for the English
statistician Karl Pearson. One formula for calculating it is presented in
Figure B.7. The example assesses the relationship between home runs
and stolen bases by five baseball players during one month of a
season. Recall that correlation coefficients range from 0 to 1.00 and
can be either negative or positive. This correlation of -.23 is
considered to be a weak, negative correlation.
Learning Check #18: In a large study of twins, the Minnesota
Twin study found a correlation of +.71 between the IQ scores of
identical twins. Another study found that family income is
correlated +.30 with the IQ of children. What do these correlation
coefficients mean?
Click here for Answer
Coefficient of Determination
One last number that can be helpful in understanding the relationship
between two variables is the coefficient of determination. The
coefficient of determination is the amount of variability that can be
accounted for in one variable by knowing a second variable. Think for
a moment of all the things that can have an impact on an exam score:
amount of time spent studying, how you feel the day of the exam,
amount of sleep the previous night, whether you were sick or felt well,
as well as a host of other factors. This means that the variability in your
exam scores (as they are usually not all the exact same score) is due to
many factors. A certain amount of the variability may be due to the
number of hours you studied for the exam. Suppose that you compute
the Pearson correlation between the number of hours you spent
studying for the exam and the score on the exam and find a correlation
of +.70. To get the coefficient of determination you simply square the
Pearson correlation, which in this case is the square of .70, or .49. If
you multiply this result by 100 percent, you end up with 49 percent.
This indicates that of all the things that can affect your exam score,
49 percent of the influence is due to the amount of time spent studying.
Learning Check #19: Given the correlation coefficients in
Learning Check #18 of +.71 and +.30, explain what you can
determine with respect to the coefficient of determination.
Click here for Answer
Learning Check #20: Suppose that the correlation coefficient
between two variables is -.80. Would this lead to a different
conclusion based on the coefficient of determination than a
correlation of +.80?
Click here for Answer [Return to Top] ### Inferential Statistics
Inferential statistics help us determine whether the difference we
find between our experimental and control groups is caused by the
manipulation of the independent variable or by chance variation in the
performances of the groups. If the difference has a low probability of
being caused by chance variation, we can feel confident in the
inferences we make from our samples to the populations they
represent.
Hypothesis Testing
In experimental research, sociologists use inferential statistics to test the null
hypothesis. The null hypothesis states that the independent variable
has no effect on the dependent variable. Consider an experimental
study of the effect of overlearning on examination performance in college students.
When we use overlearning, we study material until we know it
perfectly, and then continue to study it some more. At the beginning of
the experiment, the participants would be selected from the same
population (college students) and randomly assigned to either the
experimental group (overlearning) or the control group (normal
studying). Thus, the independent variable would be the method of
studying (overlearning versus normal studying). The dependent variable
might be the score on a 100-point exam on the material studied.
Learning Check #21: Identify the independent variable,
dependent variable, and the null hypothesis from the following
scenario:
A researcher would like to know if highlighting a textbook helps
students to score better on the exams. She randomly selects
one-half of the students in an introductory class and instructs
them to highlight their textbooks as they read. The other students
are instructed to do NO highlighting as they read.
Click here for Answer
If the experimental manipulation has no effect, the experimental and
control groups would not differ significantly in their performance on the
exam. In that case, we would fail to reject the null hypothesis. If the
experimental manipulation has an effect, the two groups would differ
significantly in their performance on the exam. In that case, we would
reject the null hypothesis. This would indirectly support the research
hypothesis, which would predict that overlearning improves exam
performance. But how large must a difference be between groups for it
to be significant? To determine whether the difference between groups
is large enough to minimize chance variation as an alternative
explanation of the results, we must determine the statistical significance
of the difference between them.
Statistical Significance
The characteristics of samples drawn from the population they
represent will almost always vary somewhat from those of the true
population. This is known as sampling error. Thus, a sample of five
students taken from your sociology class (the population) would vary
somewhat from the class means in age, height, weight, intelligence,
grade point average, and other characteristics.
If we repeatedly took random samples of five students, we would
continue to find that they differ from the population. But what of the
difference between the means of two samples, presumably
representing different populations, such as a population of students
who practice overlearning and a population of students who practice
normal study habits? How large would the differences have to be
before we attributed them to the independent variable rather than to
chance? In this example, how much difference in the performance of
the experimental group and the control group would be needed before
we could confidently attribute the difference to the practice of
overlearning?
The larger the difference between the means of two samples, the less
likely it would be attributable to chance. Sociologists typically accept
a difference between sample means as statistically significant if it has a
probability of less than 5 percent of occurring by chance. This is
known as the .05 level of statistical significance. In regard to the
example, if the difference between the experimental group and the
control group has less than a 5 percent probability of occurring by
chance, we would reject the null hypothesis. Our research hypothesis
would be supported: overlearning is effective. Scientists who wish to
use a stricter standard employ the .01 level of statistical significance.
This means that a difference would be statistically significant if it had a
probability of less than 1 percent of being obtained by chance alone.
The difference between the means of two groups will more likely be
statistically significant under the following conditions: - When the samples are large.
- When the difference between the means is large.
- When the variability within the groups is small.
Note that statistical significance is a statement of probability. We can
never be certain that what is true of our samples is true of the
population they represent. This is one of the reasons why all scientific
findings are tentative. Moreover, statistical significance does not
indicate practical significance. A statistically significant effect may be
too small or be produced at too great a cost of time or money to be
useful. What if those who practice overlearning must study two extra
hours each day to improve their exam performance by a statistically
significant, yet relatively small, 3 points? Knowing this, students might
choose to spend their time in another way. As the American statesman
Henry Clay (1777-1852) noted, in determining the importance of
research findings, by themselves "statistics are no substitute for
judgment."
Learning Check #22: Suppose that the researcher in Learning
Check #21 rejected the null hypothesis and concluded that there
was a significant difference due to highlighting. What would this
mean in terms of probability?
Click here for Answer
Learning Check #23: Can research demonstrate statistical
significance, yet have no real practical value?
Click here for Answer
Learning Check #24: Could two groups have a difference that
looked important, yet not be statistically different from one
another? That is, could the difference between two groups appear
to have practical value, yet not achieve statistical significance?
Click here for Answer [Return to Top] ### Summary
Representation of Data
Data are often represented in frequency distributions, which indicate
the frequency of each score in a set of scores. Sociologists also use
graphs to represent data. These include pie graphs, frequency
histograms, frequency polygons, and line graphs. Line graphs are
important in representing the results of experiments, because they are
used to illustrate the relationship between independent and dependent
variables.
Descriptive Statistics
Descriptive statistics summarize and organize research data. Measures
of central tendency represent the typical score in a set of scores. The
mode is the most frequently occurring score, the median is the middle
score, and the mean is the arithmetic average of the set of scores.
Measures of variability represent the degree of dispersion of scores.
The range is the difference between the highest and lowest scores. The
variance is the average of the squared deviations from the mean of the
set of scores. And the standard deviation is the square root of the
variance.
Many kinds of measurements fall on a normal, or bell-shaped, curve.
A certain percentage of scores fall below each point on the abscissa of
the normal curve. Percentiles identify the percentage of scores that fall
below a particular score.
Correlational Statistics
Correlational statistics assess the relationship between two or more
sets of scores. A correlation may be positive or negative and vary from
0.00 to plus or minus 1.00. The existence of a correlation does not
necessarily mean that one of the correlated variables causes changes in
the other. Nor does the existence of a correlation preclude that
possibility. Correlations are commonly graphed on scatter plots.
Perhaps the most common correlational technique is the Pearson's
product-moment correlation. You square the Pearson's
product-moment correlation to get the coefficient of determination,
which will indicate the amount of variance in one variable accounted
for by another variable.
Inferential Statistics
Inferential statistics permit social researchers to determine whether their
findings can be generalized from their samples to the populations they
represent. Consider a simple investigation in which an experimental
group that is exposed to a condition is compared with a control group
that is not. For the difference between the means of the two groups to
be statistically significant, the difference must have a low probability
(usually less than 5 percent) of occurring by normal random variation.
[Return to Top] |