McGraw-Hill OnlineMcGraw-Hill Higher EducationLearning Center
Student Center | Instructor Center | Information Center | Home
Newsletter
Career Opportunities
Chapter Outline
Discussion Questions
Multiple Choice Quiz
Glossary
Flashcards
Test Developer Profiles
Web Links
Feedback
Help Center


Psychological Testing and Assessment Book Cover
Psychological Testing and Assessment: An Introduction To Tests and Measurement, 5/e
Ronald Jay Cohen
Mark Swerdlik

Reliability


Alternate forms  Different versions of the same test or measure; contrast with parallel forms, 132
Coefficient alpha  Also referred to as Cronbach's alpha and alpha, a statistic widely employed in test construction and used to assist in deriving an estimate of reliability; more technically, it is equal to the mean of all split-half reliabilities, 137-138
Coefficient of equivalence  An estimate of parallel-forms reliability or alternate-forms reliability, 132.
Coefficient of generalizability  In generalizability theory, an index of the influence that particular facets have on a test score, 147
Coefficient of stability  An estimate of test-retest reliability obtained dur-ing time intervals of six months or longer, 132.
Confidence interval  A range or band of test scores that is likely to contain the "true score," 149-150
Content sampling  Also referred to as item sampling, the variety of the subject matter contained in the items; frequently referred to in the context of the variation between individual test items in a test or between test items in two or more tests, 129
Criterion-referenced testing and assessment  Also referred to as domain-referenced testing and assessment and content-referenced testing and assessment, a method of evaluation and a way of deriving meaning from test scores by evaluating an individual's score with reference to a set standard (or criterion); contrast with norm-referenced testing and assessment, 30, 109, 112
Error variance  In the true score model, the component of variance attributable to random sources irrelevant to the trait or ability the test purports to measure in an observed score or distribution of scores. Common sources of error variance include those related to test construction (including item or content sampling), test administration, and test scoring and interpretation, 18, 129
Generalizability theory  Also referred to as domain sampling theory, a system of assumptions about measurement that includes the notion that a test score, and even a response to an individual item, is composed of a relatively stable component that actually is what the test or individual item is designed to measure, and relatively unstable components that collectively can be accounted for as error, 145, 147-148
Inflation of range  Also referred to as inflation of variance, a reference to a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is inflated by the sampling procedure used and the resulting correlation coefficient tends to be higher as a consequence; contrast with restriction of range, 141
Internal consistency  Also referred to as inter-item consistency, an estimate of how consistently the items of a test measure a single construct obtained from a single administration of a single form of the test and the measurement of the degree of correlation among all of the test items, 135, 205-206
Inter-scorer reliability  Also referred to as inter-rater reliability, observer reliability, judge reliability, and scorer reliability, an estimate of the degree of agreement or consistency between two or more scorers (or judges or raters or observers), 138-139.
Item response theory (IRT)  Also referred to as latent-trait theory or the latent-trait model, a system of assumptions about measurement, including the assumption that a trait being measured by a test is uni-dimensional, and the extent to which each test item measures the trait, 148, 210n
Item sampling  Also referred to as content sampling, the variety of the subject matter contained in the items; frequently referred to in the context of the variation between individual test items in a test or between test items in two or more tests, 129
Kappa statistic  A measure of inter-scorer reliability originally designed for use when scorers make ratings using nominal scales of measurement, 139
Kuder-Richardson formulas  A series of equations developed by G. F. Kuder and M. W. Richardson designed to estimate the inter-item consistency of tests, 136-137
Odd-even reliability  An estimate of split-half reliability of a test, obtained by assigning odd-numbered items to one-half of the test and even-numbered items to the other half, 134.
Parallel forms  Two or more versions or forms of the same test when for each form, the means and variances of observed test scores are equal; contrast with alternate forms, 132
Power test  A test, usually of achievement or ability, with (1) either no time limit or such a long time limit that all testtakers can attempt all items, and (2) some items so diffi-cult that no testtaker can obtain a perfect score; contrast with speed test, 142
Reliability  The extent to which measurements are consistent or re-peatable; also, the extent to which measurements differ from occasion to occasion as a function of measurement error, 29, 32, 128-153
Reliability coefficient  General term for an index of reliability or the ratio of true score variance on a test to the total variance, 128, 139-148.
Restriction of range  Also referred to as restriction of variance, a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is restricted by the sampling procedure used, and the resulting correlation coefficient tends to be lower as a consequence; contrast with inflation of range, 141
Rulon formula  Now outdated, an equation once used to estimate internal consistency reliability, 137
Spearman-Brown formula  An equation used to estimate internal consistency reliability from a correlation of two halves of a test that has been lengthened or shortened; inappropriate for use with heterogeneous tests or speed tests, 134-135
Speed test  A test, usually of achievement or ability, with a time limit; speed tests usually contain items of uniform difficulty level, 142
Split-half reliability  An estimate of the internal consistency of a test obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once, 133-135.
Standard error of measurement  In true score theory, a statistic designed to estimate the extent to which an observed score deviates from a true score; also called the standard error of a score, 19, 148
Standard error of the difference  A statistic designed to aid in determining how large a difference between two scores should be before it is considered statistically signifi-cant, 150-152
Test heterogeneity  Also simply heterogeneity, the extent to which individual test items do not measure a single construct but instead measure different factors; contrast with test homogeneity, 135-136, 141
Test homogeneity  Also simply homogeneity, the extent to which individual test items measure a single construct; contrast with test heterogeneity, 135, 136, 174
Test-retest reliability  An estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test, 131-132.
True variance  In the true score model, the component of variance attributable to true differences in the ability or trait being measured, inherent in an observed score or distribution of scores, 129
Variance  A measure of variability equal to the arithmetic mean of the squares of the differences between the scores in a distibution and their mean, 87, 88, 129