Validity refers to the correctness of the inferences that one makes based on the results of some kind of measurement. That is, when we measure something, we need to ask whether the measurements we have taken accurately and completely reflect what we intended to measure. For example, inferences about individual differences in people’s height based on the observed scores generated from the use of a (normal) tape measure or ruler are highly valid. When used appropriately, the application of the tape measure will generate observed measurements (e.g., inches, millimeters, feet) that correspond closely to actual differences in height.
It is common to hear people refer to the “validity of the test,” which might give the impression that validity is a property of the measurement device. However, this is incorrect. Validity is not a property of any assessment device; rather, it is a property of the inferences that you—the test user—make. For example, consider once again the tape measure. We might be tempted to say that “the tape measure has validity.” However, if we made inferences about differences in intelligence based on that same set of measurements rather than differences in height, those inferences would likely be highly incorrect. Nothing has changed about the tape measure or the set of measurements generated from its application. What has changed is the inference about what is being measured.
Although this might seem an absurd example (presumably no one would use a tape measure to measure intelligence), it demonstrates that validity is not a property of the measurement instrument but of the inference being made. The phrase “the test has validity,” though technically inappropriate, is often used because there is a general assumption about which inferences are (and are not) to be made from the use of a well-known measurement device. For example, testing experts may say, “The Wonderlic has good validity.” On the surface, this may seem profoundly inaccurate; however, it should be understood that this statement actually means (or at least, should mean), “Inferences regarding individual differences in general mental ability, and inferences regarding the probability of future outcomes such as job performance, are generally appropriate by relying on observed scores generated from the appropriate use of the Wonderlic.” That we sometimes use shorthand to abbreviate such a long statement should not be taken to imply that validity is a property of the test. Rather, it should be interpreted as suggesting there is reliable and verifiable evidence to support the intended set of inferences from the use of a given measurement device.
The second common misconception is that there are different types of validity. Instead, validity is best thought of as a unitary concept addressing how completely and accurately a measure measures what it is intended to measure. However, no single method or strategy can provide all the evidence needed to make accurate or confident inferences. Thus, multiple strategies exist for generating such evidence; often, these strategies—or more aptly, the evidence generated from these strategies—are referred to as types of validity This is an unfortunate choice of words because it often leads to the misconception that validity is many different things and that some types of validity are more or less useful than other types. Validity is a single, unitary idea: It concerns the degree to which the differences we observe in measurements can be used to make accurate and confident inferences about some unobservable phenomenon.
Typical Approaches to Generating Validity Evidence
Industrial and organizational (I/O) psychologists are often concerned with whether a given measurement device can be confidently relied on for making accurate decisions about hiring and promotion. To do this, I/O psychologists attempt to correlate a measure of some job-required knowledge, skill, or ability (identified from a job analysis) with a measure of some identified job demand or criterion. However, this process requires many different inferences to be made, which, in turn, requires substantial evidence to support them. For example, it is necessary to ensure that the predictor and criterion measures accurately and completely reflect the job requirements and job demands they are intended to reflect. It is also necessary to gain evidence to show that the two measures are systematically related and that the relation is not the result of some extraneous factor that was unintentionally assessed. To gain the evidence needed to support such a large set of inferences, I/O psychologists typically use three general approaches: (a) content validity, (b) criterion-related validity, and (c) construct validity.
Content Validity Inferences
The term content validity typically refers to inferences regarding the degree to which the content on a measurement device adequately represents the universe of possible content denoting the targeted construct or performance domain. There are a variety of methods or strategies that are useful for generating evidence to support content validity inferences; however, to establish the relevance of any evidence, it is first necessary to clearly define the performance domain or construct of interest and to identify the specific objectives for the assessment tool’s use (i.e., develop test specifications). These two activities circumscribe the universe of relevant content and constrain the set of inferences that one hopes to support.
Criterion-Related Validity Inferences
Criterion-related validity refers to the degree to which the observed scores can be used to make useful inferences (i.e., accurate predictions) about future behavior or outcomes. Typically, evidence for criterion-related validity comes from correlations between the predictor measure and the criterion measures. Of course, to support useful inferences of criterion-related validity, one must first identify theoretically meaningful criterion constructs (i.e., what types of future behaviors or outcomes should be associated with or influenced by the construct denoted by the predictor measure), as well as ensure that there are measures of criterion constructs for which there is strong content validity evidence.
Construct Validity Inferences
The attempt to establish evidence for construct validity inferences is tantamount to theory testing. Construct validity encompasses a wide set of inferences regarding the nature of the psychological construct and its place in a larger nexus of constructs. In a sense, all validity inferences are part of construct validity. For example, strong support for content validity inferences can be used to support claims concerning the construct that is being measured by the assessment device. Criterion-related validity evidence is useful, too; a content-valid measure of a given construct should be related to (content-valid measures of) other constructs nearby in the nomological network and should not be related to (content-valid measures of) constructs that are far removed from the nomological network. Often, this type of evidence is referred to as convergent and discriminant validity, respectively. It is in this sense that construct validity is similar to theory testing. The definition of the construct and its relation to other constructs is in fact a mini-theory that produces specific hypotheses regarding the results of the measurement process. If most or all of those hypotheses are supported, we can be confident in the assessment device’s utility for generating observed scores, which, in turn, can be used to make a limited set of accurate inferences.
- Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 74, 478-494.
- Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
- Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.
- Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527-535.