A verbal protocol or verbal report is a cognitive task analysis (CTA) technique designed to elicit a verbalizable report of an individual’s thinking during task performance. Verbal protocols are typically elicited by trained researchers as a means to access information heeded (attended) during task performance; this is usually for the purpose of understanding the mental operations or knowledge representations responsible for the observed performance. With only minimal qualification, the catchall term introspection has been used frequently in psychology to refer to a range of verbal reporting methods, despite practical and theoretical differences among methods. The historical context and evolution of methods provides a useful basis for assessing the validity and utility of this CTA technique for capturing, describing, and explaining thinking, and, ultimately, using what is learned to develop scientific theories capable of prediction and control.
History of Introspection and Verbal Reporting Methods
Although writings on introspection can be traced back to Plato and Aristotle, more recently, British empiricists, such as John Stuart Mill (1806–1873), viewed introspection as a form of self-observation— a method of bringing elementary sensations into conscious awareness to further understand the composition of complex experience. In contrast, German psychologist Wilhelm Wundt (1832–1920) trained participants (sometimes on over 10,000 trials) to concurrently report on specific qualities and intensities of elementary sensations, rather than on higher order processes. As the practice of introspection burgeoned in psychology, the focus shifted toward a less restrictive examination of conscious experience that resembled that of the British empiricists. For example, a student of Wundt, Edward Titchener in the United States, and practitioners at the Würzburg School in Germany, used systematic introspection to examine higher order processes, such as memory and judgments, albeit using simple laboratory-based tasks. Retrospection, which Wundt had previously discounted because of the fallibility of memory, came back into vogue, primarily because of the assumption that concurrent introspection might interfere with in-task thinking. Moreover, interest grew in qualitative descriptions of thinking, which often became the focus of experiments rather than the quantitative data like reaction times they accompanied.
Essentially, there was a shift in the practice of psychology by experimental introspectionists, away from objective empirical observation toward experimenter-led, subjective constructions of behavior by the participant. Although some introspective analyses of thinking (psychophysical judgments) were deemed highly reliable, the results of systematic introspection sometimes lacked reproducibility. Low reliability was frequently attributed to insufficient participant training in reporting procedures, yet the training of participants or their selection based on their ability to verbally report was also considered an infringement on scientific objectivity. Consequently, in the early 1900s, behavioral psychologists in the United States led by John B. Watson challenged introspection— irrespective of its form—as a viable method of studying behavior. From the critics’ perspective, there appeared no way to study reliably the relationship between the subjective experiences of the person introspecting and their verbal report of those experiences. Combined with a general trend in U.S. psychology to focus on practical performance and manipulations that could determine or limit performance, introspective techniques dropped out of favor in the United States for methods thought more suitable to address these goals.
From Introspection to Think-Aloud Reports and Cognitive Process Tracing
In Europe, the use of systematic introspection continued, but the focus shifted from creative synthesis of elementary sensations toward a creative analysis that emphasized holistic, complex, and purposive behavior. Some of these psychologists moved introspection out of the laboratory to study thinking more representative of real life. For instance, a successor to the Würzburg school, Otto Selz, used systematic introspection to study classroom learning and his student, Julius Bahle, applied these methods to studying musical composition. Karl Duncker and Édouard Claparède used think-aloud methods of introspection, together with experimentation, to study problem solving. These two authors are generally considered to be the first to use think-aloud protocols. Rather than have participants analyze their sensory experience as per Wundt, these authors instructed participants to express their thoughts directly—as they occurred—while remaining focused on the experimental task. Consistent with Wundt, Claparède noted that this technique avoided memory issues associated with retrospection—a central component of post–Wundtian systematic introspection.
Prior to Selz, however, most applications of introspection concentrated on classifying the contents of thinking rather than on the process of thinking per se. Consistent with William James’s conception of thinking as a series of substantive and (inaccessible) transitive states, Selz’s process-oriented theory was centered on explaining, and through introspection, eliciting thinking as a strictly determined succession of cognitive operations.
Selz’s ideas of tracing thinking operations using verbal protocols were implemented, most notably, by Adriaan de Groot in the domain of chess. Initially, de Groot employed both systematic introspection (retrospection) and the think-aloud technique to study how chess players selected moves from a range of game positions. However, after experiencing difficulties with interrupting players to retrospect after several minutes of thinking about a position, he focused on eliciting think a loud reports since they offered a less disruptive method of systematically analyzing the relatively complicated and lengthy processes involved in chess thinking. De Groot, however, did not rule out the use of retrospection for all tasks. Instead he suggested guidelines to steer participants away from describing peculiar qualities of inner experiences during retrospection and toward recalling the sequence of thinking operations that occurred while performing the task.
De Groot suggested that verbally reporting on thinking could interfere with actual thinking, which may affect the completeness of the report and, consequently, the ability to capture the true course of thinking. He noted four possible causes for incompleteness: (1) The phase structure of the thought sequence is likely to be under the threshold of conscious awareness and absent, in any explicit sense, from the report; (2) Thought is more rapid than speech, leading to the possible omission of heeded information; (3) Wordless thoughts may not be reported while thinking aloud, and transformation into speech may disrupt the flow of thinking; and (4) Participants may intentionally suppress steps in their thinking, for instance, when they make mistakes. De Groot offered two criteria for assessing completeness: (1) the degree to which the participant is satisfied with the protocol as a representation of the actual thinking and (2) the ability to follow and understand the participant’s reasoning for a particular action. (The latter is often hindered by experimenter instructions to make some thoughts more explicit—for instance, by making references to subjects or objects as nouns or noun phrases rather than pronouns.) Both criteria require the participant and experimenter to go back through the protocol—a frequently overlooked step.
Rather than an absolute truth, de Groot argued that the cognitive process that unfolds during task performance is largely hypothetical and can be understood best in the context of a scientific theory. According to de Groot, good theories are useful if they are logically constructed, adequately describe the relevant phenomena, permit testable predictions (that are empirically supported), and can be applied to control the world to which they refer. As such, introspective techniques such as think-aloud verbal reports provide a valuable method for hypothesis formation and theory building. Likewise, introspective methods provide a means of theory testing, with the caveat that such tests are likely to be influenced by the reliability of reports.
The notion of cognitive operations was a precursor to the concept of the computer program in information processing psychology. The ideas espoused by Selz and de Groot were influential in the subsequent development of theoretical and computational models of recognition, problem solving, and comprehension by the likes of Alan Newell, Herbert Simon, and Walter Kintsch. Process tracing using verbal protocols is currently one of the primary methods of testing and validating models of cognition.
Validity of Introspective Methods and Verbal Protocols
Despite their current use—both in terms of theory development and testing—some researchers have questioned the validity of processes elicited via introspective methods. For instance, in an extensive review of studies using a range of methods, Richard R. Nisbett and Timothy Wilson showed that many participants were unaware of the actual cognitive process that led to the solution. Consistent with William James, they argued that some processes are inaccessible. They concluded that individuals do not access the specific thoughts heeded during problem solving but, instead, access implicit theories, culturally derived social rules, generalizations from past experiences or generate causal hypotheses that could explain their behavior. When individuals report correctly on the cognitive processes, it is more a matter of coincidence that their causal theory is correct than their ability to access the associated memory trace.
Anders Ericsson and Herbert Simon contended that it is possible to obtain valid reports of thinking as long as specific procedures were followed. Consistent with James, Selz, and Newell and Simon, they suggested that cognition could be described as a sequence of states that are transformed by successive information processes. Whereas cognitive processes themselves may be consciously inaccessible, the output of a previous process and the input to a future one are held in short-term memory (STM). Although information entering STM may be replaced, pointers to symbols and operations are likely to remain present temporarily in STM. This information can, therefore, be heeded and, hence, verbalized.
When individuals are instructed to think aloud, information in STM can either be verbalized directly (Level 1) or transformed from nonverbal to verbal code (Level 2). While such transformation may require additional processing, this has been shown not to affect the contents of the report (but does increase time on task). When participants think aloud using Level 1 or 2 verbalizations, a direct trace of heeded thoughts, and consequently an indirect trace of the internal steps in cognitive processing, can be elicited. When participants are allowed to verbalize (additional) information that is not normally heeded (Level 3), or are instructed to do so via directed probes that encourage analysis or interpretation of one’s thinking, their verbal report may only resemble partially—or be completely independent of—the actual processing that mediated task performance.
Ericsson and Simon made a clear distinction between verbal reports generated from actual heeded thoughts and those procedures that permit generation of causal theories, generalizations or assumptions about, or explanations, analyses, descriptions, or summaries of personal task performance. These authors indicated that although concurrent reports are preferred, retrospective procedures can provide reliable and valid reports, providing reports are restricted to Level 1 or 2 verbalizations, and undirected probes are used (such as, think aloud or recall the first thought you remember thinking), or at least probes that do not encourage participants to deviate from the thinking they would have engaged in if not probed. Following de Groot, they also recommended that the time between an activity and verbally reporting on that activity should be minimized, and participants be asked to report about a specific incident rather than generalizing from other incidents.
Beyond Verbal Reporting: Methods of Analysis
While the analysis of verbal protocols is beyond the scope of this entry, the interested reader’s attention is drawn to Micheline T. H. Chi’s distinction between protocol analysis and verbal analysis. Protocol analysis is useful for domains in which the problem space has been clearly defined a priori (which can take numerous process-tracing, experimental, and computer simulation studies to achieve) such that clear predictions can be made about alternative strategies that might be employed. Verbal analysis, on the other hand, explicates procedures for analyzing verbal protocols in a manner that increases understanding of the knowledge representations that support performance in a theoretically driven manner. Importantly, verbal analysis provides one possible means of fleshing out a hypothetical problem space so that a subsequent protocol analysis can be carried out.
In sum, the use of introspective methods has evolved over time, toward procedures for obtaining concurrent think-aloud and retrospective reports. Together with recommendations from de Groot, Ericsson and Simon’s procedures provide, arguably, the most detailed instructions for obtaining valid and reliable protocols. In advocating for these methods, however, one should heed de Groot’s warning: Methods should be chosen based on the research goal. Other incident-based, CTA methods exist that use verbal reporting procedures without restricting them to Level 1 and 2 verbalizations, including the Critical Decision Method developed by Gary Klein and colleagues (see the entry Cognitive Task Analysis). Although susceptible to criticisms associated with any method of retrospection, these alternatives are useful, not necessarily in the context of a protocol analysis to test a given theory about alternative strategies in a well-defined problem space but as a means of hypothesis formation and theory building on a more macro level, especially in complex domains were the problem space is ill defined.
- Chi, M. T. H. (1997). Quantifying qualitative analyses of verbal data: A practical guide. The Journal of the Learning Sciences, 6(3), 271–315.
- Danziger, K. (1980). The history of introspection reconsidered. Journal of the History of the Behavioral Sciences, 16(3), 241–262.
- de Groot, A. (1946/1965). Thought and choice in chess. The Hague, Netherlands: Mouton.
- Ericsson, K. A., & Crutcher, R. J. (1991). Introspection and verbal reports on cognitive processes—Two approaches to the study of thinking: A response to Howe. New Ideas in Psychology, 9(1), 57–71.
- Ericsson, K. A., & Simon, H. (1993). Protocol analysis: Verbal reports as data (Rev. ed.). Cambridge: MIT Press.
- Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84(3), 231–259.