Verbal Protocols

verbal  protocol  or  verbal  report  is  a  cognitive task analysis (CTA) technique designed to elicit a verbalizable report of an individual’s thinking during  task  performance.  Verbal  protocols  are  typically elicited by trained researchers as a means to access  information  heeded  (attended)  during  task performance;  this  is  usually  for  the  purpose  of understanding  the  mental  operations  or  knowledge  representations  responsible  for  the  observed performance.  With  only  minimal  qualification, the catchall term introspection has been used frequently in psychology to refer to a range of verbal reporting methods, despite practical and theoretical differences among methods. The historical context  and  evolution  of  methods  provides  a  useful basis  for  assessing  the  validity  and  utility  of  this CTA  technique  for  capturing,  describing,  and explaining  thinking,  and,  ultimately,  using  what is learned to develop scientific theories capable of prediction and control.

History of Introspection and Verbal Reporting Methods

Although writings on introspection can be traced back to Plato and Aristotle, more recently, British empiricists, such as John Stuart Mill (1806–1873), viewed introspection as a form of self-observation— a  method  of  bringing  elementary  sensations  into conscious  awareness  to  further  understand  the composition  of  complex  experience.  In  contrast, German  psychologist  Wilhelm  Wundt  (1832–1920)  trained  participants  (sometimes  on  over 10,000  trials)  to  concurrently  report  on  specific qualities and intensities of elementary sensations, rather than on higher order processes. As the practice of introspection burgeoned in psychology, the focus  shifted  toward  a  less  restrictive  examination  of  conscious  experience  that  resembled  that of the British empiricists. For example, a student of Wundt, Edward Titchener in the United States, and  practitioners  at  the  Würzburg  School  in Germany, used systematic introspection to examine  higher  order  processes,  such  as  memory  and judgments,  albeit  using  simple  laboratory-based tasks. Retrospection, which Wundt had previously discounted  because  of  the  fallibility  of  memory, came  back  into  vogue,  primarily  because  of  the assumption  that  concurrent  introspection  might interfere with in-task thinking. Moreover, interest grew in qualitative descriptions of thinking, which often  became  the  focus  of  experiments  rather than the quantitative data like reaction times they accompanied.

Essentially,  there  was  a  shift  in  the  practice  of psychology   by   experimental   introspectionists, away from objective empirical observation toward experimenter-led,   subjective   constructions   of behavior by the participant. Although some introspective analyses of thinking (psychophysical judgments) were deemed highly reliable, the results of systematic  introspection  sometimes  lacked  reproducibility. Low reliability was frequently attributed to  insufficient  participant  training  in  reporting procedures, yet the training of participants or their selection  based  on  their  ability  to  verbally  report was  also  considered  an  infringement  on  scientific objectivity.  Consequently,  in  the  early  1900s, behavioral  psychologists  in  the  United  States led by John B. Watson challenged introspection— irrespective  of  its  form—as  a  viable  method  of studying  behavior.  From  the  critics’  perspective, there appeared no way to study reliably the relationship between the subjective experiences of the person  introspecting  and  their  verbal  report  of those experiences. Combined with a general trend in  U.S.  psychology  to  focus  on  practical  performance  and  manipulations  that  could  determine or  limit  performance,  introspective  techniques dropped out of favor in the United States for methods thought more suitable to address these goals.

From Introspection to Think-Aloud Reports and Cognitive Process Tracing

In  Europe,  the  use  of  systematic  introspection continued,  but  the  focus  shifted  from  creative synthesis  of  elementary  sensations  toward  a  creative  analysis  that  emphasized  holistic,  complex, and  purposive  behavior.  Some  of  these  psychologists  moved  introspection  out  of  the  laboratory to study thinking more representative of real life. For instance, a successor to the Würzburg school, Otto  Selz,  used  systematic  introspection  to  study classroom  learning  and  his  student,  Julius  Bahle, applied these methods to studying musical composition. Karl Duncker and Édouard Claparède used think-aloud  methods  of  introspection,  together with  experimentation,  to  study  problem  solving. These two authors are generally considered to be the first to use think-aloud protocols. Rather than have participants analyze their sensory experience as  per  Wundt,  these  authors  instructed  participants  to  express  their  thoughts  directly—as  they occurred—while remaining focused on the experimental  task.  Consistent  with  Wundt,  Claparède noted  that  this  technique  avoided  memory  issues associated  with  retrospection—a  central  component of post–Wundtian systematic introspection.

Prior  to  Selz,  however,  most  applications  of introspection concentrated on classifying the contents  of  thinking  rather  than  on  the  process  of thinking  per  se.  Consistent  with  William  James’s conception  of  thinking  as  a  series  of  substantive  and  (inaccessible)  transitive  states,  Selz’s process-oriented  theory  was  centered  on  explaining,  and  through  introspection,  eliciting  thinking as  a  strictly  determined  succession  of  cognitive operations.

Selz’s ideas of tracing thinking operations using verbal  protocols  were  implemented,  most  notably, by Adriaan de Groot in the domain of chess. Initially, de Groot employed both systematic introspection (retrospection) and the think-aloud technique  to  study  how  chess  players  selected  moves from  a  range  of  game  positions.  However,  after experiencing  difficulties  with  interrupting  players to retrospect after several minutes of thinking about  a  position,  he  focused  on  eliciting  think a loud  reports  since  they  offered  a  less  disruptive method  of  systematically  analyzing  the  relatively complicated  and  lengthy  processes  involved  in chess  thinking.  De  Groot,  however,  did  not  rule out  the  use  of  retrospection  for  all  tasks.  Instead he suggested guidelines to steer participants away from  describing  peculiar  qualities  of  inner  experiences  during  retrospection  and  toward  recalling the sequence of thinking operations that occurred while performing the task.

De  Groot  suggested  that  verbally  reporting on  thinking  could  interfere  with  actual  thinking, which  may  affect  the  completeness  of  the  report and,  consequently,  the  ability  to  capture  the  true course of thinking. He noted four possible causes for incompleteness: (1) The phase structure of the thought sequence is likely to be under the threshold  of  conscious  awareness  and  absent,  in  any explicit sense, from the report; (2) Thought is more rapid  than  speech,  leading  to  the  possible  omission of heeded information; (3) Wordless thoughts may  not  be  reported  while  thinking  aloud,  and transformation  into  speech  may  disrupt  the  flow of  thinking;  and  (4)  Participants  may  intentionally suppress steps in their thinking, for instance, when  they  make  mistakes.  De  Groot  offered  two criteria  for  assessing  completeness:  (1)  the  degree to which the participant is satisfied with the protocol as a representation of the actual thinking and (2)  the  ability  to  follow  and  understand  the  participant’s  reasoning  for  a  particular  action.  (The latter  is  often  hindered  by  experimenter  instructions  to  make  some  thoughts  more  explicit—for instance,  by  making  references  to  subjects  or objects as nouns or noun phrases rather than pronouns.)  Both  criteria  require  the  participant  and experimenter to go back through the protocol—a frequently overlooked step.

Rather than an absolute truth, de Groot argued that  the  cognitive  process  that  unfolds  during task  performance  is  largely  hypothetical  and  can be  understood  best  in  the  context  of  a  scientific theory. According to de Groot, good theories are useful if they are logically constructed, adequately describe  the  relevant  phenomena,  permit  testable  predictions  (that  are  empirically  supported), and can be applied to control the world to which they  refer.  As  such,  introspective  techniques  such as  think-aloud  verbal  reports  provide  a  valuable method  for  hypothesis  formation  and  theory building. Likewise, introspective methods provide a  means  of  theory  testing,  with  the  caveat  that such  tests  are  likely  to  be  influenced  by  the  reliability of reports.

The  notion  of  cognitive  operations  was  a  precursor  to  the  concept  of  the  computer  program in  information  processing  psychology.  The  ideas espoused  by  Selz  and  de  Groot  were  influential in the subsequent development of theoretical and computational  models  of  recognition,  problem solving,  and  comprehension  by  the  likes  of  Alan Newell,   Herbert   Simon,   and   Walter   Kintsch. Process tracing using verbal protocols is currently one of the primary methods of testing and validating models of cognition.

Validity of Introspective Methods and Verbal Protocols

Despite  their  current  use—both  in  terms  of  theory  development  and  testing—some  researchers have  questioned  the  validity  of  processes  elicited  via  introspective  methods.  For  instance,  in an  extensive  review  of  studies  using  a  range  of methods, Richard R. Nisbett and Timothy Wilson showed  that  many  participants  were  unaware  of the  actual  cognitive  process  that  led  to  the  solution. Consistent with William James, they argued that  some  processes  are  inaccessible.  They  concluded that individuals do not access the specific thoughts  heeded  during  problem  solving  but, instead, access implicit theories, culturally derived social rules, generalizations from past experiences or  generate  causal  hypotheses  that  could  explain their behavior. When individuals report correctly on  the  cognitive  processes,  it  is  more  a  matter of  coincidence  that  their  causal  theory  is  correct than their ability to access the associated memory trace.

Anders Ericsson  and  Herbert  Simon  contended  that  it  is  possible  to  obtain  valid  reports of  thinking  as  long  as  specific  procedures  were followed. Consistent with James, Selz, and Newell and Simon, they suggested that cognition could be described  as  a  sequence  of  states  that  are  transformed   by   successive   information   processes. Whereas  cognitive  processes  themselves  may  be consciously inaccessible, the output of a previous process and the input to a future one are held in short-term memory (STM). Although information entering  STM  may  be  replaced,  pointers  to  symbols  and  operations  are  likely  to  remain  present temporarily in STM. This information can, therefore, be heeded and, hence, verbalized.

When individuals are instructed to think aloud, information  in  STM  can  either  be  verbalized directly (Level 1) or transformed from nonverbal to verbal code (Level 2). While such transformation  may  require  additional  processing,  this  has been  shown  not  to  affect  the  contents  of  the report  (but  does  increase  time  on  task).  When participants  think  aloud  using  Level  1  or  2  verbalizations,  a  direct  trace  of  heeded  thoughts, and  consequently  an  indirect  trace  of  the  internal  steps  in  cognitive  processing,  can  be  elicited. When participants are allowed to verbalize (additional)  information  that  is  not  normally  heeded (Level  3),  or  are  instructed  to  do  so  via  directed probes  that  encourage  analysis  or  interpretation of  one’s  thinking,  their  verbal  report  may  only resemble partially—or be completely independent of—the  actual  processing  that  mediated  task performance.

Ericsson  and  Simon  made  a  clear  distinction between  verbal  reports  generated  from  actual heeded  thoughts  and  those  procedures  that  permit  generation  of  causal  theories,  generalizations or  assumptions  about,  or  explanations,  analyses, descriptions,  or  summaries  of  personal  task  performance.  These  authors  indicated  that  although concurrent  reports  are  preferred,  retrospective procedures can provide reliable and valid reports, providing  reports  are  restricted  to  Level  1  or  2 verbalizations,  and  undirected  probes  are  used (such  as,  think  aloud  or  recall  the  first  thought you  remember  thinking),  or  at  least  probes  that do  not  encourage  participants  to  deviate  from the  thinking  they  would  have  engaged  in  if  not probed.  Following  de  Groot,  they  also  recommended  that  the  time  between  an  activity  and verbally reporting on that activity should be minimized, and participants be asked to report about a  specific  incident  rather  than  generalizing  from other incidents.

Beyond Verbal Reporting: Methods of Analysis

While  the  analysis  of  verbal  protocols  is  beyond the  scope  of  this  entry,  the  interested  reader’s attention  is  drawn  to  Micheline  T.  H.  Chi’s  distinction  between  protocol  analysis  and  verbal analysis. Protocol analysis is useful for domains in which the problem space has been clearly defined a priori (which can take numerous process-tracing, experimental,   and   computer   simulation   studies  to  achieve)  such  that  clear  predictions  can  be made  about  alternative  strategies  that  might  be employed.  Verbal  analysis,  on  the  other  hand, explicates  procedures  for  analyzing  verbal  protocols  in  a  manner  that  increases  understanding of  the  knowledge  representations  that  support performance  in  a  theoretically  driven  manner. Importantly,  verbal  analysis  provides  one  possible means of fleshing out a hypothetical problem space  so  that  a  subsequent  protocol  analysis  can be carried out.


In  sum,  the  use  of  introspective  methods  has evolved over time, toward procedures for obtaining   concurrent   think-aloud   and   retrospective reports.  Together  with  recommendations  from de  Groot,  Ericsson  and  Simon’s  procedures  provide,  arguably,  the  most  detailed  instructions  for obtaining valid and reliable protocols. In advocating for these methods, however, one should heed de  Groot’s  warning:  Methods  should  be  chosen based on the research goal. Other incident-based, CTA methods exist that use verbal reporting procedures  without  restricting  them  to  Level  1  and 2  verbalizations,  including  the  Critical  Decision Method  developed  by  Gary  Klein  and  colleagues (see the entry Cognitive Task Analysis). Although susceptible   to   criticisms   associated   with   any method  of  retrospection,  these  alternatives  are useful, not necessarily in the context of a protocol analysis  to  test  a  given  theory  about  alternative strategies in a well-defined problem space but as a means of hypothesis formation and theory building on a more macro level, especially in complex domains were the problem space is ill defined.


