Policy Capturing

Policy capturing has its roots in activities central to industrial-organizational psychology. Its origins lie in the work of the Personnel Research Laboratory at Lackland Air Force Base in the 1950s, and it achieved prominence in the broader field of psychology with the publication in 1960 of Paul Hoffman’s Psychological Bulletin paper, “The Paramorphic Representation of Clinical Judgment.” Although policy capturing is not derivative of Egon Brunswik’s probabilistic functionalism, scholars in the Brunswikian tradition have been attracted to policy capturing as a method to address certain research questions.

This attraction is based on the practice in good policy capturing research of faithfully representing the situation to which generalization is aimed. Hence, policy capturing is often loosely associated with social judgment theory, which is the contemporary manifestation of Brunswikian theory.


Many cognitive tasks require decision makers to make inferences or decisions based on multiple, often conflicting, pieces of information. Such tasks include performance assessment and salary assignments, employment interviewing, investment decisions, medical diagnosis and prognosis, evaluation of charges of discrimination, assessment of the desirability of employment contracts, and even the selection of the most appropriate bullet for use by an urban police forceā€”the list is endless. Such tasks abound in organizations! Policy capturing is used to investigate what factors influence the decision maker, and how heavily each is weighted. Environmental outcomes are not part of the policy capturing procedure.

Data Collection

The essence of the data-gathering procedure is to have an individual respondent make a substantial number of judgments on multi attribute bundles, often paper-and-pencil or computer-presented profiles, but the judgments can be made on actual people, files, or abstracts of files or anything that can be represented by a set of quantitative variables. Typically, the attributes and the judgments are treated as interval scales, although dichotomous data such as gender are often found among quantitative variables including age, length of experience, or rating scales. The phrase individual respondent was not an accident, in that policy capturing entails an idiographic analysis, which may be followed by nomothetic analyses of the idiographic indexes describing the individual respondents.

Data Analysis

The appropriate data analysis depends on a number of factors, including the level of measurement of the predictors and the judgments, the function forms relating predictors to judgments, predictor intercorrelation, the presumed aggregation rule, and so forth. The common default procedure is multiple regression, but mathematical models that reflect noncompensatory rules such as conjunctive or disjunctive decision rules might also be used. Given that multiple regression is the most commonly used analytic procedure, we’ll concentrate on it.

Multiple Regression

Given a sufficient number of multiattribute judgments, the investigator can use ordinary least squares regression to ascertain the degree to which each attribute accounts for variance in the judgments. Doing so requires the usual assumptions underlying regression, some of which can be violated without affecting the investigator’s inferences too severely. For example, if the linear function form assumed in the regression algorithm does not correspond exactly to that used by the judge but is monotonically related thereto, the model misspecification tends to be inconsequential. Furthermore, appropriate cross-validation within subjects provides some sense of the consequences of violations of assumptions.

Performance Indexes

Standard multiple regression indexes are used to describe the judgment policy. The multiple correlation, or Rs is crucial; if it is not substantial, the investigator cannot claim to have learned much about the judgment policy of the person without further analysis. One possible reason for a low Rs, other than the ever-present unreliability of judgment, is that the function forms relating the judgments to the attributes may be nonlinear. A second is that the judge’s aggregation rule may be nonadditive, and the assumption of additivity has resulted in model misspecification. These first two possibilities can be subjected to some data snooping, such as inspecting the scatter plots for nonlinearities, fitting quadratic and multiplicative terms, and so forth.

These possibilities may be illustrated by a favorite class exercise: having the class design a policy-capturing study to select a mate, serve as subjects, and analyze the data. “A malevolent deity has sentenced you to spend 10 years alone on an island. In a last-minute moment of benevolence, the deity has offered to create a mate according to your preferences as assessed via policy capturing.” The students develop the attributes, but gender and age, ranging from 2 to 72 years, must be among them. Assuming linearity, the weight for age will likely be trivial, but inspection of the scatterplot of desirability on age will show radical nonlinearity and implicate an important source of judgment variance. If gender is key, the regression of desirability on, say, physical attractiveness will reveal a strange-looking array, with half the points sitting in a straight line across the bottom of the scatterplot and the other half forming a typical envelope of points. Other reasons for a low R include systematic shifts in importance weights as a result of doing the task, inattention caused by fatigue, and the like. One way of obtaining information about whether the judge is systematic is including reliability profiles and assessing test-retest reliability (rt). If both Rs and Rtt are low, it is unlikely the judge can be modeled.

Suppose R is high? Then we can predict the judge’s responses from the attributes, assuming linearity and additivity. We can predict a new set of judgments via cross validation of a holdout sample, mitigating concerns about capitalization on chance. But this high R should not be taken to mean that the judge is in fact using a linear additive model; the predictive power of the linear model is all too well-known. But the weights do give us significant information about what attributes are important to the judge. Comparing the weights of different judges who have provided judgments on the same data set may reveal sources of conflict or reveal underlying sources of agreement in situations marked by conflict.

Typical Results

People are remarkably predictable in judgment tasks. If R is not more than .70 or so, even after taking nonlinearities and nonadditivities into account, do not place much faith in the results. It is not uncommon for expert judges in consequential tasks to have Rs values of .90 or more. An important finding is that judges often believe that they are taking many attributes into account, even though relatively few attributes control virtually all the systematic variance in the judgments.

Other Decision Models

There are many approaches to the study of multiattribute judgment, decision making, and decision aiding. Some require the decision maker to decompose the decision intuitively, such as the MAUT (multiattribute utility theoretic) model of Ward Edwards and his colleagues. Others, like policy capturing, have the decision maker make multiple holistic judgments and employ computer decomposition, such as the ANOVA approach of information integration theory. Judgment analysis is important to mention in this article because it is often confused with policy capturing. It uses the same statistical machinery as policy capturing but refers to the situation where environmental outcomes are available, and the full power of the lens model can be brought to bear on exploring the relation between the judge and the environment.


  1. Brehmer, A., & Brehmer, B. (1988). What have we learned about human judgment from thirty years of policy capturing? In B. Brehmer & C. R. B. Joyce (Eds.), Human judgment: The SJT view (pp. 75-114). Amsterdam: Elsevier Science Publishers B. V. (North-Holland).
  2. Cooksey, R. W. (1996). Judgment analysis: Theory, methods, and applications. San Diego: Academic Press.
  3. Hoffman, P. J. (1960). The paramorphic representation of clinical judgment. Psychological Bulletin, 57, 116-131.
  4. Roose, J. E., & Doherty, M. E. (1978). A social judgment theoretic approach to sex discrimination in faculty salaries. Organizational Behaviorand Human Performance, 22, 193-215.