Jury Versus Judges’ Decisions

In American trials, the verdict is reached by either a judge or a jury, raising questions as to how these two fact finders reach their decisions and whether their decisions systematically differ. Most research has focused on the jury, though some key studies have compared the decisions of judges and juries. The available archival studies, case-specific judicial surveys, and experimental research reveal substantial similarities and a few differences.

Before a civil or criminal trial begins, the parties decide whether it will be a trial by jury or a trial by judge (“bench trial”). A bench trial occurs if both sides waive the right to a jury. Although rates vary across jurisdictions, approximately one third of felony trials and one in four civil trials in the United States are bench trials. Outside the United States, a mixed tribunal consisting of both lay and professional members may determine the outcome of a trial. Some critics of the American jury suggest that the justice system would be improved by transferring more decision-making responsibility to professional judges. Thus, in evaluating the performance of the jury, the policy-relevant comparison is not some hypothetical ideal decision maker, whatever characteristics that model might have, but rather the professional, legally qualified judge. Yet most research on trial court decision makers has focused on the jury rather than on the judge, perhaps because the jury is both a cultural icon and a favorite whipping boy, because relying on conscripted amateurs rather than professionals to decide outcomes of important conflicts raises questions, and because laypersons are more accessible than judges as subjects for research on decision making.

Judges and juries differ in several potentially important ways. Modern judges are legally trained professionals, while jurors are not. Although the modern jury may include members with legal training, most jurors are legal novices. Although some members of a jury may be more educated than the judge or have more expertise in a particular trial-related topic, the judge is typically more educated than the average juror. While the trial judge sits and deliberates alone, jury members have an opportunity to pool their experiences and opinions and to correct misunderstandings. Jurors, unlike judges, must reach a group decision. Finally, the judge is a repeat player, employed by the state to preside regularly over legal matters. In contrast, for the citizen selected to serve as a juror, jury service is an unusual event. The differences between the decisions of judges and juries may be due to one or a combination of these factors.

Direct comparisons of judge and jury decision making are challenging to make, and whether the data are obtained in the field or the laboratory, the implications of the results are sometimes ambiguous. Nonetheless, they are necessary to draw policy conclusions about the decision-making behavior of these two parties. Although still rare, their number is increasing, providing some systematic evidence on two central questions. First, do judge and juries differ in the likelihood of their deciding on conviction or liability or in the level of sentence severity or damage amounts they choose? Second, do juries and judges consider different factors or weigh them differently in reaching their decisions?

Researchers compare the decisions of judges and juries using three methods: archival analyses examining outcomes in jury versus bench trials, judicial surveys in which the judge indicates how he or she would have decided the case that a jury decided, and experiments in which judges and jurors respond to the same (or similar) simulated evidence. All three methods have strengths and weaknesses. A picture of current knowledge about judge-jury similarities and differences emerges from a composite of these findings.

Archival Research

Archival studies capture the real decisions of judges and juries, but they must attempt to control statistically for differences between the cases tried by judges and those tried by juries. Because the tribunal that hears the case is determined by the choice of the litigant not to plead guilty or to settle as well as whether or not to waive the jury, the selection of cases is far from random and must be modeled for successful control. Most of the archival research comparing judge and jury ver-diets has been conducted on civil trials. Researchers have not found consistent differences in overall liability rates between juries and judges. They have shown, however, that differential win rates on liability in federal civil trials vary across categories of cases, with plaintiffs winning more often in bench trials than in jury trials in some major types of tort cases and less often in bench trials than in jury trials in others. Before concluding that these patterns indicate that the win rates on the decisions of the judge and the jury do not differ on average or differ systematically by case type, it is necessary to determine how much of the apparent similarity or difference is attributable to selection effects. A key difficulty is that in attempting to control for selection differences, researchers do not have even an approximate measure of the strength of the evidence for liability and must rely on the limited case characteristics that have been recorded in the archives.

The same modeling problem arises for comparisons of judge and jury verdicts on damages. Several archival studies report that damage awards from jurors tend to be higher than those from judges, although a substantial portion of the apparent difference disappears when controls for differences in the cases they decide are introduced. Other studies have found no overall differences. Similarly, some researchers using archival data to study punitive damages and the size of punitive damage awards have found more frequent and higher awards given by juries, while others have found no differences.

Case-Based Judicial Surveys

Nearly 50 years ago, to address the selection problems that plague archival comparisons of judge and jury verdicts, Harry Kalven and Hans Zeisel developed the innovative approach of a case-based judicial survey for their classic national study of the American jury. In each of the 7,500 cases they studied, the trial judge completed a questionnaire describing the char-acteristics of the case, the jury’s verdict, and how the judge would have decided the same case in a bench trial. Studies using this approach depend on the independence of the judges’ personal verdict reports— that is, whether the judge reports a personal verdict preference before learning the jury’s verdict or, if the report comes after, whether the judge has been affected by that knowledge. Nonetheless, the case-based judicial survey ensures that the judge and jury verdicts being compared come from equivalent cases because the judge in each case is providing a judicial verdict in precisely the same real trial that a jury decides. The judge and jury in the Kalven-Zeisel survey of 3,500 criminal cases agreed in 78% of the cases on whether or not to convict. When they disagreed, the judge would have convicted when the jury acquitted in 19% of the cases, and the jury convicted when the judge would have acquitted in 3% of the cases—a net leniency rate of 16%. Disagreement rates were no higher when the judge characterized the evidence as difficult than when the judge characterized it as easy, suggesting that the disagreements were not produced by the jury’s inability to understand the evidence. Disagreement rates did rise when the judge characterized the evidence as close rather than clear, indicating that disagreement cases were, at least in the judge’s view, more likely to be those cases that were susceptible to more than one defensible verdict. Primary explanations offered for the overall differences were differences in judgments about the credibility of witnesses and a different threshold of reasonable doubt. Two smaller, more recent studies using the Kalven-Zeisel method have shown remarkably similar patterns in criminal cases, obtaining 74% to 75% agreement, with a greater leniency of 13% to 20% from the jury. Studies outside the United States have shown similarly high levels of agreement between professionals and juries or lay judges in criminal cases.

For the 4,000 civil trials in their judicial survey, Kalven and Zeisel obtained the same agreement rate of 78% on liability, but disagreement was almost equally divided, so that in 12% of the cases, the jury found for the plaintiff, while the judge favored the defense and in 10% of the cases, the jury found for the defense, while the judge would have made an award. Awards by juries were 20% higher on average than awards by judges. Several smaller, recent studies of civil jury cases in several locations have indicated agreement rates on liability between 63% and 77%, but it is unclear whether any overall change has occurred over time because no national study comparable with the Kalven and Zeisel study has been conducted. Because punitive damages are awarded so rarely (in roughly 3% of contract and tort cases), researchers conducting case-specific judicial surveys have not been able to compare judge and jury decisions on punitive damages.

Simulations and Experiments

A third approach comparing judge and jury decision making asks judges and laypersons to reach decisions based on simulated trial materials in the form of written materials or videotaped presentations. Comparability is ensured by having the judges and laypersons read or view precisely the same stimulus. In addition, by experimentally varying the stimulus within each group, researchers have tested how specific variations in the evidence (e.g., exposure to inadmissible evidence) affect judges and laypersons differently. The extent to which these simulated decisions reflect what the decision makers would do in a real trial is contingent on the extent to which the simulation captures the relevant factors that would affect trial judgments. The materials in these studies generally must be brief to obtain judicial participation. Trial elements such as jury instructions are often truncated or missing. Mock jurors frequently are not asked to deliberate, so that the judicial responses are compared with those of individuals rather than the group decisions of multiple jurors. Nonetheless, the few experiments comparing judges and laypersons reveal a striking overall similarity between their decisions.

Experiments showed that exposure to inadmissible evidence influenced judges and laypersons similarly, and both groups were reluctant to impose liability based on mere statistical evidence. In several experiments involving personal injury cases, both professionals and laypersons responding to the same cases used the severity of injury in determining pain and suffering awards, but in one study, laypersons were more variable in their awards. It is unclear how much, or whether, variability in decisions by lay decision makers would drop if their awards were determined by group verdicts rather than individual judgments. In determining criminal sentences in a series of cases, laypersons favored lower penalties than judges did, indicating that the same greater leniency was shown by juries in criminal conviction cases in case-based judicial surveys.

In a few of the experiments directly comparing the judgments of judges and laypersons, the samples tested raise questions about the representativeness of the findings because the laypersons were students or the judges sampled came from a unique subgroup (e.g., those who had signed up to attend a law and economics seminar). Much more research is needed to map experimentally the differences and similarities between the judgments of judges and juries before concluding that judges are better than juries at specific tasks (e.g., assessing risk) or that deliberations enable juries to outperform judges on other tasks (e.g., assessing conflicting testimony).

Finally, in addition to the few studies that have exposed judges and laypersons to the same stimulus, in several experiments with judges, researchers conducted conceptual replications of the impact of heuristics (e.g., anchoring, hindsight, framing) or of extralegal factors, which had previously been tested on laypersons. With a few exceptions, these experiments have revealed that judges show a similar susceptibility to these cognitive illusions.


  1. Diamond, S. S., & Rose, M. R. (2005). Real juries. In J. Hagan (Ed.), Annual Review of Law & Social Science (Vol. 1, pp. 255-284). Palo Alto, CA: Annual Reviews.
  2. Eisenberg, T., Hannaford-Agor, P. L., Hans, V. P., Waters, N. L., Munsterman, G. T., Wells, M. T., et al. (2005). Judge-jury agreement in criminal cases: A partial replication of Kalven and Zeisel’s The American Jury. Journal of Empirical Legal Studies, 2, 171-207.
  3. Guthrie, C., Rachlinski, J. J., & Wistrich, A. J. (2001). Inside the judicial mind. Cornell Law Review, 86, 777-830.
  4. Kalven, H., Jr., & Zeisel, H. (1966). The American jury. Boston: Little, Brown.
  5. Robbennolt, J. K. (2005). Jury decision making: Evaluating juries by comparison to judges: A benchmark for judging? Florida State University Law Review, 32, 469-509.

Return to the overview of Trial Consulting in Forensic Psychology.