Situational Judgment Tests

Many work situations require the job incumbent to make a judgment about aspects of the situation and respond to the practical situational demands. An effective response to the practical demands of a situation may require the appropriate use of some combination of one’s abilities and other personal attributes. Situational judgment tests (SJTs) are psychometric tests that are specifically designed to assess individual differences in this overall ability to make effective judgments or responses to a wide variety of situations.

Situational judgment tests are typically administered in a paper-and-pencil mode, although they may be implemented in other modes, such as video-based items and interview questions. The SJT is made up of several situations, each presenting a hypothetical critical incident and several courses of action in response to the situation. The instructional and response format is dependent on the specific SJT. In many SJTs, respondents are required to rate each possible course of action on a five-point effectiveness scale or indicate the best and worst action among the alternatives provided. In other SJTs, respondents are asked to rate each possible action in terms of the likelihood that they would adopt it or indicate their most likely and least likely actions among the possible actions provided.

Development of Situational Judgment Tests

Most modern versions of SJTs derive from the work of Stephen Motowidlo and his colleagues, which builds on Robert Sternberg’s concept of tacit knowledge (i.e., job-relevant knowledge needed to accomplish everyday tasks that are usually not openly stated or part of any formal instruction) and improves the measurement of the concept by using job analyses to identify the types of judgments made on a specific job and to improve their content and face validity. The development process usually involves the identification of a set of work-related constructs that are targeted in an SJT. Job incumbents are asked to generate critical incidents or situations that require ability or expertise related to these constructs. Other job incumbents provide a set of possible actions that could be taken to resolve or improve the situations, and a third group, usually subject-matter experts, provide effectiveness ratings of each solution and judgments about the best and worst of the solutions. These ratings and judgments are analyzed and used to develop a final item and scoring key that is applied to the items. (Detailed examples of the SJT development process are provided in the references listed in References.)

Probably as a result of the job-relevant features of the test development process, studies have shown that respondents tend to have more favorable perceptions of SJTs compared with other types of employment tests, such as cognitive ability and personality tests, because they believe the tests are relevant to work situations and valid in predicting job performance. In addition to the evidence on the face validity of SJTs, there is increasing evidence that SJTs can produce substantial zero-order and incremental criterion-related validities. However, unlike cognitive ability and personality measures, which have an extensive literature and large database, the empirical evidence on SJTs is much less established, and the theoretical or conceptual underpinnings of SJTs are much less understood.

Criterion-Related Validity of Situational Judgment Tests

In a meta-analysis of the criterion-related validities of SJTs, Michael McDaniel and his colleagues found that the average observed validity of 102 validity coefficients was .26, a figure that increased to .34 when it was corrected for criterion unreliability. However, there was substantial unexplained variability (55%) in coefficients around this population value, suggesting that the validity of an SJT is likely to be moderated by many variables. Moderator analyses indicated that measures developed as the result of job analyses yield larger validity coefficients than those that are not based on job analyses, but the results of other moderator analyses were inconclusive because of the small number of studies or small total sample size in one or more of the groups of studies formed by the moderator variable.

Several primary studies involving employees in a wide variety of jobs conducted since Motowidlo et al. revived interest in the SJT method have produced validities similar to the averages reported by McDaniel et al. In addition, several studies found that SJTs produce validity increments (in predicting job performance) over cognitive ability, personality, job knowledge, and experience measures.

The criterion-related validity of SJTs in predicting performance seems well-established. Although SJTs appear to be related to cognitive ability and, in some studies, to personality measures as well, incremental validity of SJTs over and above personality and cognitive ability has been reported in multiple studies. The substantial variability in correlations may result because different constructs are being measured depending on the types of situations included on the SJT. When the situations require cognitive-based constructs such as planning, organizational ability, and analytical problem solving, SJT scores correlate highly with cognitive ability test scores compared with situations that require constructs associated with interpersonal or leadership skills, for example, which are more personality based.

Construct Validity of Situational Judgment Tests

In contrast to the emerging evidence on the criterion-related validity of SJTs, research on the construct validity of SJTs is in its infancy. The bulk of the studies on SJTs are not explicitly designed to examine the nature of the constructs assessed by SJTs, and therefore the construct validity evidence available to date is indirect, at best. The constructs underlying SJTs are likely related to the concepts of adaptability, contextual job knowledge, and practical intelligence, but the precise nature of the test constructs is inextricably tied to the specific content of the SJT items.

Efforts to conduct factor analysis on SJT items typically produce little support for a priori factors that researchers have tried to incorporate into their items. The first factor in these analyses usually accounts for two to three times the variance of the second factor, but unless the scale comprises a large number of items, internal consistency (coefficient alpha) reliabilities are typically low. One explanation for these results is that responses to a single SJT item with its varied options may be the result of a variety of individual difference constructs, including both ability and motivational or personality constructs. This is consistent with empirical findings indicating that SJTs are correlated with a variety of variables, including cognitive ability and personality traits.

Given the nature of SJTs and the extant research findings, it is unlikely that SJTs measure any single unidimensional construct, even though it may be legitimate to use an overall SJT score to represent the composite (multifaceted) ability or effectiveness in situational judgment. Like interviews and many paper-and-pencil tests, SJTs may be better construed as a method of measurement that can be adapted to measure a variety of job-related constructs in different situations. However, some types of situational judgment constructs are almost inherently assessed in typical SJTs. That is, SJTs may be construed as a method of testing that constrains the range of constructs measured.

Like the interview, SJTs have dominant constructs (though they are different in nature from those in the interview method) that are readily or almost inherently assessed. Primary dominant constructs include adaptability constructs, which are likely a function of both individual difference traits and acquisition through previous experiences, and contextual knowledge constructs, which may be gained through experience in real-world contexts. Collectively, these SJT-dominant constructs can be represented by the global construct called practical intelligence. However, unlike the interview, SJT-dominant constructs are not associated with the structural format of the SJT (i.e., candidates are presented with a problem situation followed by the requirement to generate, endorse, or rate a series of response options). Instead, the dominant constructs are associated with the core characteristics of the test content of typical SJTs.

The details of these construct validity issues are beyond the scope of this entry. Interested readers may refer to works by David Chan and Neal Schmitt (1997, 2002, 2005), which elaborate three distinct but interrelated core characteristics of SJT content (i.e., practical situational demands, multidimensionality of situational response, and criterion-correspondent sampling of situations and response options in test content development) and relate them to SJT performance as well as job performance.

There is emerging evidence on the face validity and criterion-related validity of SJTs, but studies that directly address the fundamental issue of construct validity are lacking. Research on the construct validity of SJTs could help to identify the boundary conditions for the criterion-related validity of SJTs. Such research would also clarify the SJT constructs and increase our understanding of the nature of SJT responses and their relationship to job performance and other work-relevant variables.


