The confidence that eyewitnesses express in their decision at an identification test or lineup has long been recognized within the criminal justice system as an indicator of the likely reliability or accuracy of the witness. In contrast, psychology researchers have downplayed the diagnostic value of eyewitness identification confidence. Although only a relatively small proportion of the variance in identification accuracy is associated with variance in confidence, recent research using what is known as a confidence-accuracy (CA) calibration procedure suggests that confidence—measured immediately after the identification decision—can provide a useful (but not infallible) pointer for crime investigators to the likely accuracy of positive but not negative (i.e., lineup rejections) lineup decisions. This conclusion definitely does not apply, however, to confidence judgments expressed in the courtroom as, by this time, there has been an opportunity for postidentification influences (such as feedback from lineup administrators or other witnesses) to shape any subsequent confidence judgments. Nor is the conclusion applicable to judgments expressed by witnesses prior to having viewed a lineup. A major challenge for future research in this area will be to define the boundary conditions for obtaining robust CA calibration, which, in turn, will enhance the capacity to diagnose the likely accuracy of identification decisions.
Eyewitnesses will often provide some sort of expression of confidence in their memory when they examine a police lineup or photo spread or when they testify in court about the identity of the offender. Their degree of confidence is known to exert a strong influence on assessments made by the police, lawyers, and jurors about the likely reliability of their testimony. Yet it is known that eyewitness confidence is sometimes an extremely misleading cue to the likely accuracy of an identification. The following sections examine when identification confidence is informative about the offender’s identity and when it is likely to mislead.
Eyewitness confidence has been of major interest because confidence is an easily obtainable index that could potentially provide a guide for the criminal justice sector as to the likely reliability of an eyewitness identification response. Given the crucial role that identifications can play in some investigations and trials, together with the overwhelming evidence of eyewitness fallibility provided by DNA exoneration cases and experimental simulations of identification tests, knowing how much weight should be attached to witnesses’ confidence estimates is an important forensic issue.
Even prior to attending an identification test, witnesses may express a particular degree of confidence in their capacity to identify the offender, with the confident witness likely to impress police investigators. These assessments are likely to be influenced by a variety of factors such as witnesses’ evaluations of the strength of the memorial image for the offender, their recollections of the quality of view they had of the offender at the time of the crime, their perceptions of how good a recall they displayed when interviewed by the police, and so on. To date, there is no evidence to indicate that such preidentification test confidence assessments should be considered as a guide to the likely accuracy of an identification.
Factors for and Against a Confidence-Accuracy Relationship
There is now a sizable literature on the relationship between confidence, when expressed after an identification decision, and identification accuracy. Researchers have mounted compelling arguments both for and against expecting a strong relationship between identification confidence and accuracy. For example, in recognition memory theories and research, the strong link between memory signal strength and recognition accuracy and confidence provides firm grounds for expecting a meaningful CA relationship. Furthermore, witnesses with very strong memories of the offender are likely to make a rapid identification, with the apparent ease or speed of the identification providing a potentially reliable cue to confidence. Other support comes from research on psychophysical discrimination, indicating that confidence may well regulate, rather than be a result of, the decision process.
Arguments against a strong CA relationship have, however, been much more consistently advanced, with these views reinforced by demonstrations of overconfidence in various domains of human judgment. Some of the grounds for questioning a meaningful CA relationship include our (a) inability to review all factors that should shape confidence; (b) tendency to focus too heavily on confirmatory evidence for a decision; (c) problems with translating subjective judgments of confidence into some kind of numerical confidence value; (d) reliance on cues to confidence that, while sometimes veridical, may also be misleading (e.g., a face in a lineup may seem very familiar not because it is that of the offender but because it had been seen in the context of the event previously, or witnesses may infer that the face in the lineup that seems most familiar must be the offender because they got an excellent view of the offender at the crime); and (e) almost inevitable exposure to postidentification social influences that produce malleable confidence judgments.
For some time the dominant view among eyewitness memory researchers has been that postidentification confidence does not provide a particularly informative guide to the likely accuracy of an identification decision. It has been generally accepted that the CA relationship is best described as lying between weak (at worst) to modest (at best) for witnesses who make a positive identification (i.e., choosers) from either a culprit-present lineup or a culprit-absent lineup— indicated by CA correlations that seldom exceed 0.3— and virtually nonexistent (correlations around 0) for witnesses who reject either of these lineups (non-choosers). Note, however, that the correlations for choosers have been shown to be higher when, for example, (a) the encoding and test stimuli have been allowed to vary as they do in the real world, (b) stimulus encoding conditions were optimal, and (c) witnesses were encouraged to be self-aware with respect to their preidentification decision behaviors by being asked to view a video of their own identification decision before giving a confidence assessment.
Although the finding of a modest CA correlation is clearly a reliable one, it does not provide the complete picture regarding the CA relationship. This requires supplementing the correlation between confidence and the identification decision outcome (accurate, inaccurate) with an examination of other characteristics of the CA relation—specifically, an examination of CA calibration and patterns of overconfidence/ underconfidence. The correlation coefficient reflects the variance in decision accuracy associated with variations in confidence. For the eyewitness identification paradigm, which typically involves a witness making a single identification decision, this therefore reflects variance explained at the level of the group but is not informative about the likely accuracy of a witness’s decision accompanied by a specific level of confidence (e.g., 70% confident or 90% confident). Information about the latter is, however, obtainable by applying the calibration approach to the examination of the CA relationship and, since the late 1990s, a number of studies of the CA relationship in eyewitness identification have used this approach.
At a conceptual level, the procedure is quite simple, with the proportion of accurate identification decisions determined for each level of identification confidence (10%, 20%, etc.). This provides the basis for plotting a calibration curve and the derivation of calibration, overconfidence/underconfidence, and resolution statistics. Inspection of the calibration curve provides a direct indication of the levels of identification accuracy expected in association with varying degrees of confidence; for example, judgments made with 100% confidence might be characterized by 85% accuracy. Perfect calibration is, of course, characterized by 0% accuracy at 0% confidence, 10% accuracy at 10% confidence, right through to 100% accuracy at 100% confidence. Any departure from perfect calibration is not only illustrated by comparing the obtained and ideal calibration curves but can also be captured in a calibration statistic (varying from 0 to 1, with 0 indicating perfect calibration) and an overconfidence/ underconfidence statistic (varying from 0 ± 1, with increasing positive and negative departures from 0 denoting increasing overconfidence and underconfidence, respectively). In addition to the guide provided by the calibration procedure to the likely accuracy of identification decisions made with particular levels of confidence, it also provides a resolution statistic that (like the correlation coefficient) indicates variance in decision accuracy associated with confidence.
A number of studies have now applied the calibration approach to the study of the CA relation within the eyewitness identification paradigm. While these studies have sampled only a limited range of forensically relevant variables and, indeed, a limited array of levels on each of those variables, they have at least used several different sets of stimulus materials and events (including both central and peripheral targets) that have given rise to different rates of correct and false identifications, different retention intervals between encoding and test (with the longest being 1 week); varied the similarity of lineup targets and foils; and varied the lineup instructions.
Studies with adult participants have presented calibration curves, for positive identification responses (or choosers), that roughly parallel the ideal calibration curve. In other words, as confidence increases so too does accuracy in a systematic manner, a pattern not suggested by the typically modest CA correlations reported in these same studies. Generally, however, the curves indicate some degree of overconfidence, with accuracy rates at the high end of the confidence scale (i.e., 90% to 100% confident) typically around the 75% to 90% level. In contrast, no such systematic patterns have been detected for participants who rejected the lineup (i.e., nonchoosers). Three other findings are also noteworthy. First, in association with confidence estimates of 90% to 100%, diagnosticity ratios— indicating the ratio of hits to false alarms—were substantially higher than for lower confidence estimates. Second, participants whose identification responses were very fast were better calibrated than those whose identifications were slow. The latter finding is to be expected given that participants with an exceptionally strong memory for the culprit should not only identify the culprit when present in the lineup, and be appropriately confident, but should also be less likely to falsely identify an innocent suspect, thereby reducing the likelihood of confident, incorrect responses. Third, there is some evidence that interventions designed to improve adults’ scaling of confidence judgments (by causing them to reflect carefully on the encoding and identification test conditions or the possibility that their identification decision could be mistaken) can reduce overconfidence and improve CA calibration.
It is encouraging that similar patterns of CA calibration findings have also been reported in a number of studies using various forms of a face recognition paradigm, the basic requirement of which is to judge whether or not faces presented at test had been among an array of faces that had previously been presented in a study phase. Specifically, these studies have demonstrated robust CA calibration for positive (but not negative) decisions in both absolute and relative judgment versions of the face recognition paradigm, but with overconfidence more pronounced as task difficulty increased (e.g., shorter stimulus exposure durations at either study or test).
One feature of the calibration studies that must be highlighted is that the confidence judgments from participant witnesses were obtained immediately after the identification response, thereby ensuring that they were not affected by any postidentification influences (e.g., from the lineup administrator or other witnesses) that are known to exert a profound influence on confidence judgments quite independent of the accuracy of the identification response. Thus, while the calibration studies illustrate meaningful CA relations, eyewitness researchers are in strong agreement that confidence assessments provided after some delay (e.g., in the courtroom) are potentially highly misleading about the likely accuracy of an identification.
Not all the evidence on the CA relation obtained with the calibration approach is positive about the CA relation. For example, research done with samples of children aged 10 to 13 years highlights poor CA calibration and extreme overconfidence, illustrated by accuracy rates sometimes as low as 30% in association with confidence judgments of 90% to 100%. Furthermore, children’s overconfidence in their identifications has, thus far, proven resistant to interventions designed to reduce it.
While there is still much to be done in terms of testing the generality of findings obtained via the calibration approach across a variety of forensically relevant conditions, the present findings are, nevertheless, important from an applied perspective. As indicated earlier, while the CA correlation addresses the group-level variance in accuracy explained by confidence, the calibration approach provides the additional insight into the likely accuracy of particular identifications made with some specific level of confidence. The available data strongly suggest that police investigators should pay close attention to witnesses’ confidence estimates solicited at the time of the identification and, hence, not subject to any social influence. Specifically, extremely confident (and rapid) identifications of the suspect in the lineup, while by no means guaranteed to be accurate, should signal to police investigators that there is a very real chance that the suspect is the culprit and, thus, stimulate a closer search for supportive evidence. When, however, the identification of the suspect is not made with extremely high confidence, and is perhaps made in a ponderous manner, it should signal real doubts about whether the suspect is the culprit and act as a reminder to investigators that they should strongly consider alternative hypotheses about the culprit’s identity. In contrast, investigators should not attempt to interpret the likely accuracy of witnesses’ rejections of a lineup based on the associated confidence levels. Although lineup rejections have diagnostic value with respect to the guilt or innocence of the suspects, the witnesses’ confidence levels do not assist in that diagnosis.
Encouragingly consistent with these conclusions that are based on experimental simulations are some analyses of findings from real criminal cases. In this archival work, when there was strong incriminating (nonidentification) evidence against a suspect (which admittedly does not prove that the suspect was the culprit), very confident witness identifications much more strongly pointed to the police suspect than to the innocent lineup foils.
Barriers to the Use of the Calibration Approach
Application of the calibration approach to the study of the CA relation in eyewitness identification has clearly been valuable. Unfortunately, there is one major obstacle to the more widespread application of the approach. As the published work shows, use of this approach in the eyewitness identification context requires extremely large sample sizes. The typical eyewitness identification task simulates the real-world investigation: The witness observes a crime, views a lineup, and either makes a positive identification or rejects the lineup. In other words, only one data point is provided by each participant witness. However, stable calibration curves and statistics (for choosers or nonchoosers) require approximately 200 to 300 data points for each experimental condition examined. Thus, the existing published studies with an identification paradigm are characterized by sample sizes considerably in excess of what many laboratories find practical to achieve. In contrast, an old-new face recognition paradigm allows for a large number of repeated measures and, in turn, derivation of calibration statistics for each participant. One consequence of this sample size problem is that future research into how calibration varies over forensically relevant conditions is likely to proceed quite slowly.
The issue of social influences on identification confidence and the malleability of confidence have already been mentioned—and these issues are also discussed specifically elsewhere. Some further discussion of these issues is required here, however, to round out the discussion of identification confidence.
As has been indicated, the empirical evidence shows that witness confidence judgments are informative about the likely accuracy of positive identification decisions if they are solicited at the time of the identification. But from the time of the identification through to the end of a trial, witnesses may have a variety of further interactions with the police, other witnesses, and lawyers, culminating often in a courtroom appearance. Although none of these interactions can have any bearing on the accuracy of the decision that was indicated at the identification test, they do have the potential to influence significantly any subsequent expression of confidence in that decision. This may mean, for example, that any confidence judgment expressed in the courtroom may be quite different from the one that was made at the time of the actual identification test. In turn, whereas confidence at the time of the identification decision may be informative about identification accuracy, these subsequent expressions of confidence may not be.
Some of the key variables that have been shown to influence postidentification judgments of confidence include confirming and disconfirming feedback about the accuracy of the identification provided, for example, by a lineup administrator or another witness. This feedback may be in the form of explicit verbal feedback from one of these sources or may involve more subtle verbal or nonverbal cues. Regardless of when and how the feedback is delivered, its impact will be to make a witness appear more credible or believable if it is confirming feedback and thus inflates confidence or less credible if it disconfirms and deflates confidence. In other words, cues that can affect confidence judgments but not the underlying judgmental accuracy can render a witness more or less believable to jurors. Thus, a witness who falsely identifies an innocent police suspect may not be particularly confident at the time of making an identification but may be exceptionally confident at some later stage in a courtroom. It is for these reasons that eyewitness researchers have strongly endorsed the collection of any assessments of confidence at the time of the identification—for it is then that the confidence judgments are maximally informative about accuracy—and have little faith in the probative value of identification confidence judgments that witnesses may express in the courtroom.
- Brewer, N. (2006). Uses and abuses of eyewitness identification confidence. Legal and Criminological Psychology, 11, 3-24.
- Brewer, N., & Wells, G. L. (2006). The confidence-accuracy relationship in eyewitness identification: Effects of lineup instructions, foil similarity and target-absent base rates. Journal of Experimental Psychology: Applied, 12, 11-30.
- Cutler, B. L., & Penrod, S. D. (1989). Forensically relevant moderators of the relation between eyewitness identification accuracy and confidence. Journal of Applied Psychology, 74, 650-652.
- Juslin, P., Olsson, N., & Winman, A. (1996). Calibration and diagnosticity of confidence in eyewitness identification: Comments on what can be inferred from the low confidence-accuracy correlation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1304-1316.
- Kassin, S. M., Tubb, V. A., Hosch, H. M., & Memon, A. (2001). On the “general acceptance” of eyewitness testimony research: A new survey of the experts. American Psychologist, 56, 405-H6.
- Keast, A., Brewer, N., & Wells, G. L. (in press). Children’s metacognitive judgments in an eyewitness identification task. Journal of Experimental Child Psychology.
- Lindsay, D. S., Read, J. D., & Sharma, K. (1998). Accuracy and confidence in person identification: The relationship is strong when witnessing conditions vary widely. Psychological Science, 9, 215-218.
- Sporer, S. L., Penrod, S. D., Read, J. D., & Cutler, B. L. (1995). Choosing, confidence, and accuracy: A meta-analysis of the confidence-accuracy relation in eyewitness identification studies. Psychological Bulletin, 118, 315-327.