Behavioral Observation Methods

Behavioral observation is a widely used method of behavioral assessment. Unlike other methods of behavioral assessment, most of which rely on people’s perceptions of behavior, behavioral observation involves watching and recording the behavior of a person in typical environments (e.g., classrooms). The assumption is therefore that data collected are more objective than are perceptions. Most methods of behavioral observation provide quantitative and objective data that can be used to determine current levels of behavior, to set goals for behavioral improvement, and to measure change following intervention plans.

Depending on the nature of the behaviors of concern, observers may be interested in any one or a combination of several characteristics related to the behavior. The most common characteristic observed is frequency, or how often a behavior occurs. Other characteristics include magnitude (how intense a behavior is) and duration (how long a behavior lasts). A behavior change agent might be interested in reducing the frequency of a problem behavior, reducing its intensity, or reducing its duration. Regardless of which characteristic is observed, it is important to measure that characteristic consistently throughout the behavior intervention process.

Anecdotal (ABC) Recording

One exception to the suggestion that behavioral observation methods produce objective and quantifiable information about behaviors is anecdotal recording. Anecdotal recording involves recording and interpreting a narrative of behavior during an observation period using an antecedent-behavior-consequence (ABC) format for interpreting behavior. To conduct an anecdotal observation, an observer records all behaviors observed, along with what was observed to occur before and after the behaviors. For this type of observation, it is important that only observable behaviors are recorded. No inferences about behaviors should be made. For example, if a student is observed to slam her book closed, the observer should record “slammed book closed,” rather than “student frustrated.” Either during or after the observation period, it is helpful to arrange observations into a chart that specifies behaviors, antecedents (what happened prior to the behavior), or consequences (what happened as a result of the behavior). It is also helpful to keep track of the time at which behaviors were observed to occur.

Anecdotal recording is a method of choice when behaviors of concern are unclear. In other words, if one is unsure about the exact nature of a behavioral concern, anecdotal recording allows the observer to include observations of all behaviors. This is often a necessary first step in targeting particular behaviors for more focused or structured observation. Once behaviors of concern are pinpointed, however, the subjective and effuse nature of anecdotal recording makes it unsuited for continued use. At that point, the methods of choice are those that provide more quantitative and objective data. These methods are discussed below.

Interval Recording Methods

Interval recording methods produce a record of the number of intervals during which a behavior is observed to occur. There are three basic variations on interval recording—partial-interval recording, whole-interval recording, and momentary time sampling—but all focus on observing the frequency of the behavior, and all use simple yes or no counts of whether a behavior was observed to occur during each interval.

Partial-Interval Recording

Partial-interval recording begins with the observer determining the size of the interval needed. The size of the interval depends on the nature of the behavior, but 30 seconds is a common choice. Next, the observer creates a grid of boxes on a sheet of paper, with each box representing one interval. If the observation is conducted without the assistance of a computer program, a stopwatch is typically used to time the intervals. The observer begins observing the student or client for the presence of the target behavior. After the interval has passed, the observer records whether the behavior occurred during the interval. If observations are being recorded on paper, an X would be marked in the appropriate box if the behavior occurred. The observer then observes for the remaining intervals until the observation period is over.

Observers may choose one of two options for actually recording whether the behavior occurred during the interval. One option is to set aside a prespecified amount of time for the actual recording of the behavior. For example, if an interval is 30 seconds in length, the last 5 seconds of that interval might be devoted to recording. During those 5 seconds, no behavior that occurs is recorded. The second option is to observe and record simultaneously. In this option, behavior would be observed for the entire 30 seconds, and the observer records while continuing to observe. The advantage of the first option is that no behaviors are missed while recording. This is especially important if more than one behavior is being observed at the same time. The disadvantage, however, is that 5 seconds of each interval are unavailable for data collection. If only one behavior is being observed, if maximum observation time is desired, and if the observer is skilled in behavioral observation, the observe-and-record-simultaneously option may be preferred.

Whole-Interval Recording

Whole-interval recording is similar to partial-interval recording in all aspects but one. In partial-interval recording, the behavior is recorded as having occurred (i.e., an X is placed in the box) if it was observed to occur at any point during the interval. For example, if head banging is the behavior of interest, it would be recorded as having occurred even if it lasted only 3 seconds. In whole interval recording, the behavior has to have occurred throughout the entire interval in order to be recorded as having occurred. Head banging in the previous example would only be marked as having occurred if it lasted for the entire 30-second interval.

Data interpretation is the same, regardless of whether one is using partial- or whole-interval recording. Once the observation period is over, the data collected are aggregated so they are easily understandable. This involves adding the number of intervals during which the behavior occurred, dividing the sum by the number of intervals observed, and multiplying by 100. The resulting product indicates the percentage of intervals during which the behavior was observed to occur. Converting to a percentage allows for comparisons across intervals of varying lengths.

Interval recording is a preferred method when the target behavior occurs at a moderate but steady rate. It should be noted that interval recording tends to over- or underestimate the actual frequency of behaviors, depending on whether partial- or whole- interval recording is used. With partial-interval recording, behavior frequency tends to be overestimated. For example, if the target behavior occurs once every 30 seconds during a 5-minute observation, but each instance lasts only 2 seconds, the observer would record an X in each box. The resulting percentage is 100, which is interpreted to mean that the behavior occurred during 100% of the intervals observed. Although this is true, “100%” overestimates the actual time the person spent engaged in the behavior (which in this case is only 20 seconds out of a 5-minute observation). With whole-interval recording, underestimations are possible. For example, using the same data presented above, no Xs would be placed in any boxes, because the behavior never lasted an entire 30- second interval. The observer would interpret these data as indicating that the behavior occurred during 0% of the intervals observed. Although this is true, it underestimates the actual frequency. Typically, whole-interval recording is used only when the duration of the behavior is of concern.

One significant advantage of using interval recording is that it is easier than some other methods. Specifically, because behaviors are not being counted per se, the observer only needs to note whether or not the behavior occurred during the interval. Regardless of the number of times the behavior occurs during an interval, only one X is marked in the box.

Momentary Time Sampling

The second major type of interval recording is momentary time sampling. Like interval recording, time sampling begins with the observer determining the size of the interval desired. Intervals typically are shorter in time sampling than in partial- or whole-interval recording.

A grid is constructed, with each box representing one interval. The observer begins the observation by starting the timer, and then momentarily observes the student or client at the end of the interval. If the behavior was observed to be occurring at the moment observed, an X is placed in the box. The next interval immediately follows. The major difference between interval recording and time sampling is that in interval recording, all behaviors that are observed during the interval are recorded. In time sampling, only behaviors that are occurring at the end of the intervals are recorded. It is entirely possible that a behavior might occur during every interval but never be recorded as having occurred if the behavior does not happen to be occurring at the end of the interval. For this reason, time sampling is not a preferred method for behaviors that occur only briefly (e.g., hitting). It is relatively easy, like interval recording, because the observer needs to note only if the behavior was occurring at the end of the interval. To interpret time sampling data, the observer adds the number of times sampled during which the behavior was observed to occur (i.e., the number of Xs), divides the sum by the number of intervals observed, and then multiplies that number by 100. The resulting figure represents the percentage of times sampled during which the behavior was observed to occur.

The advantages of time sampling are its relative ease (as noted above), and the fact that between sampling points, the observer can perform other tasks (such as observing others in the vicinity). The major disadvantage of time sampling is that very little of the total observation time is spent actually observing the student or client. For example, assuming “momentary” means 1 second, a 5-minute observation divided into 10-second intervals would mean the observer would observe 30 times (i.e., 6 times per minute for 5 minutes) or for 30 seconds out of 5 minutes. Many behaviors that could be recorded during that time are not recorded unless they happen to be occurring at the end of the interval.

Event or Frequency Recording

Event or frequency recording involves recording the number of times a behavior is observed to occur during an observation period. This type of recording involves a count of the number of separate instances of the behavior. An observer using event recording records the time at which the observation starts. The observer then notes each time the behavior either begins or ends, usually by using tally marks on a piece of paper (if a computer is not being used). At the end of the observation period, the observer adds the number of tally marks. The data can be interpreted as the number of times the behavior occurred. If comparisons across observations of different lengths are desired, the data can be recorded as a rate. For example, if 10 behaviors occurred during a 10-minute observation, the rate would be 1 behavior per minute.

Although event or frequency recording sounds simple, it is actually more difficult than either interval recording or time sampling. In the latter two methods, the observer needs to note only whether the behavior was occurring at the time of the observation. With event recording, the observer needs to know when a behavior either starts or stops. Therefore, it is imperative that behavioral definitions are written comprehensively enough to make this determination. For example, if a person engages in a self-injurious behavior for 3 seconds, pauses for 1 second, and then resumes the behavior, is this counted as one event or two events? The answer needs to be available in the definition of the behavior being used for the purpose of the observation.

Sometimes event recording can be used without the need for an observer actually seeing the behavior occur. Many behaviors produce permanent products that can be counted. Examples include the number of math worksheet problems completed and the number of days a student is tardy for school (using attendance records). The use of permanent products is desired over live observation because of its efficiency and verifiability. This is especially true for behaviors that occur very infrequently. Infrequent behaviors are very difficult to “catch” during an observation. With permanent products, behaviors can be “observed” after the fact.

Event recording is easily understandable to those not trained in behavioral observation, as it involves counting behaviors. Assuming behavioral definitions are written to discriminate between instances of a behavior, event recording is a good choice for recording the frequency of a behavior. Its disadvantages include, as mentioned above, the requirement of very precise behavioral definitions and the fact that it lacks utility for behaviors that are not easy to count (e.g., inattention) as well as for behaviors that occur very infrequently (if no permanent products result from the behavior).

Duration Recording

Duration recording produces an estimate of the amount of time a person spends engaged in a particular behavior. This is the only method discussed thus far that allows one to make statements related to percentage of time spent engaged in a behavior (although interval recording and time sampling data are often misinterpreted as meaning this). Duration recording is a difficult recording method to use, because the observer needs to note when the behavior both begins and ends. As such, behavioral definition specificity is imperative with this method.

Duration recording begins with the observer watching for the target behavior to begin. A stopwatch is started at that time, and then stopped when the behavior ends. The intervening time is recorded and the stopwatch is reset. This procedure is continued for the rest of the observation period. Subsequent to the observation, the total amount of time spent engaged in the behavior is computed by adding each of the amounts of time for individual instances of the behavior. Typically, that sum is divided by the number of instances of the behavior to obtain an average duration of each behavior. Because each recorded amount of time also corresponds to one behavior, duration recording also provides event or frequency recording data.

Duration recording provides very useful information, but it is difficult to use. Furthermore, it is only recommended if the duration of the behavior is a concern. For example, behaviors such as smoking or hitting are not particularly amenable to duration recording, because what is of most concern is the frequency of smoking or hitting (not how long they last). However, for behaviors such as tantrums or daydreaming, the goal may be to decrease their duration.

Latency Recording

Latency recording is a very specific observation method that provides information about the amount of time that elapses between an environmental event and the commencement or completion of a target behavior. Most typically, this method is used to determine the amount of time it takes for a person to comply with a command. In this case, the environmental event is the command, and the target behavior is compliance with that command. This method is very difficult to use, because not only does “compliance” need to be very solidly defined, but the event (i.e., command) also needs to be identifiable.

With latency recording, the observer starts the stopwatch when a command is given, and stops it when the client either begins to comply or has complied completely with the command. It is the observer’s choice whether to measure the time to beginning compliance or completing the task, but whatever method is chosen needs to be used consistently. The decision is likely to be based on the commands themselves (i.e., the behavior being requested), and characteristics of the client. For example, if the client typically begins to comply immediately when asked to brush his or her teeth, but then becomes distracted and never finishes the task, the observer would likely record time to completion of the task.

Regardless of method, once the client has satisfied the condition, the observer notes the time that elapsed, and he or she resets the stopwatch until the next command is given. This procedure is repeated for the duration of the observation period. Afterwards, the observer adds the elapsed time for each command and divides by the number of commands. This produces an average amount of time to compliance. It should be noted that compliance might not occur with some commands. In those cases, the observer would note that there was no compliance and restart the stopwatch when the next command is given. When interpreting results, it is important to note how many instances of noncompliance were observed.

Other Methods of Observation

With the exception of duration and latency recording, each of the methods discussed above addresses behavioral frequency. As discussed earlier, observations may be made of frequency, magnitude/intensity, and duration. Duration recording and latency recording are both examples of methods for measuring duration of behaviors. Typically, behavioral magnitude/ intensity is assessed by assigning a rating to the magnitude of a behavior (sometimes referred to as performance-based behavioral recording). Rating scales may be developed to measure behavioral magnitude during a particular observation.

The validity of this method may be unknown, because validity is typically not determined for measures created for use with a single client. Some published procedures are available for particular behaviors (e.g., self-injurious behavior), but most often scales are created for use with particular clients. Measuring behavioral magnitude/intensity can be difficult, because different levels of magnitude need to be defined. For example, if the magnitude/intensity of social withdrawal is being measured using a scale of 1 to 4, definitions of what specific behaviors or characteristics constitute each of those ratings need to be written.

In addition to serving as a measure of behavior frequency, permanent products can also be used as a measure of behavior intensity. For example, the magnitude or intensity of trichotillomania (i.e., compulsive hair pulling) can be assessed by measuring the size of patches of pulled out or thin hair. The amount of bedwetting can be observed by measuring the size of wet spots on a bed. These behaviors leave permanent products that make actual observation of the behaviors unnecessary.

Observation With Published Instruments

Several commercially available behavior-rating scales include forms for behavioral observation. Two of the more widely used include the Behavior Assessment System for Children—2nd Edition (BASC-2) and the Achenbach System of Empirically Based Assessment (ASEBA). The BASC-2’s Student Observation System uses a momentary time sampling format for rating a variety of maladaptive and adaptive behaviors that are also included in other BASC-2 components (e.g., parent and teacher rating forms). The ASEBA includes a Direct Observation Form that includes 96 problem behaviors that are also represented on other ASEBA forms. Unlike the BASC-2 Student Observation System, the Direct Observation Form involves observing the student and recording behavior for a 10-minute period, then rating on problem behaviors observed during that time.

Technology and Behavioral Observation

Increasingly, behavioral observations are being conducted using various computer-based tools and programs. Software for personal computers and hand-held devices is often used for observations. These programs decrease observer error (e.g., observers are prompted to record behavior, eliminating the requirement for observers to keep track of time while observing), compile data collected in a format easily interpretable by the professionals involved, and allow for more sophisticated observational strategies. For example, some programs allow for the assessment of sequential conditions for behaviors. This possibility allows the observer to measure the likelihood of a particular behavior occurring, given the behavior that occurred before it. In other words, it allows predictions of behavioral probabilities.

Issues in Behavioral Observations

Assessment Reactivity

Because observers are in the physical presence of the client while collecting behavioral observation data, there is the potential for the procedure itself to change the client’s behavior. This is referred to as assessment reactivity. Assessment reactivity can significantly affect the validity of observation data, so steps need to be taken to minimize its effects. The most common step addressed in the literature is to allow the client time to habituate to the observer’s presence and activities. Habituation refers to a process whereby a person, upon prolonged exposure to a stimulus, stops responding to that stimulus. In the case of behavioral observation, the stimulus is the observer and the response is the change in typical behavior. Habituation can be achieved by allowing the client to get used to the observer’s presence before any data are collected. Habituation is easier to achieve if the observer is as unobtrusive as possible. Sitting slightly behind but to the side of the person being observed is sometimes helpful.

Reliability

Reliability refers to the consistency of results obtained from an assessment procedure, and it is important for the purposes of behavioral observation. There are several types of reliability, including internal consistency, test-retest, and inter-rater reliability. The first two are less applicable than the third for behavioral observation. With regard to test-retest reliability, for example, behaviors are not expected to remain stable over time, so low retest reliability is less a function of the instrumentation being used than the characteristics being assessed. Inter-rater reliability is an important concept in behavioral assessment, however. It is important that two observers agree on whether targeted behaviors are occurring. Strong inter-rater reliability depends heavily upon solid behavioral definitions and comprehensive training for behavioral observers.

Defining Behaviors

Behavioral definitions should have several characteristics. They should be objective, clear, and complete. Objective means the definition should include only observable aspects of the behavior. No inferences or judgments should be necessary when using the definition. The definition should be clear, meaning that it is understandable to any person who would want to conduct observations using the definition. Finally, the definition should be complete. It should delineate the bounds of the behavior, so that decisions can be made about whether a particular behavior represents an instance of the target behavior being observed.

Using Behavioral Observation Results

Results of behavioral observations are typically used for three purposes related to intervention planning. First, they are used as a baseline of current levels of behavior. A baseline tells the professionals involved what to expect in the future if no intervention is to occur with an individual. Baseline data are also used for the second purpose—namely, the formulation of goals. Goals should be based on current levels of behavior. To not use baseline data in formulating goals is to risk setting goals that are unrealistic or too lenient. The third purpose for which results of behavioral observation are used is to measure outcomes. If initial observation data are used to determine baseline levels of behavior and for goal setting, later data can be used as a measure of whether interventions are successful. If data are being collected on a problem behavior, the behavior should decrease in frequency, magnitude, or duration if an intervention is successful. Conversely, if data are collected on an appropriate behavior, occurrences of the behavior should increase.

Behavioral observations are also conducted for research purposes. The data may be used to describe the behavior of an individual or group, or they may be used to measure change in behavior contingent upon some environmental manipulation or individual treatment. Sometimes in research, sophisticated coding schemes are used to categorize or describe the behaviors observed, but they typically involve the use of one or more of the methods described above.

References:

  1. Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.). Upper Saddle River, NJ: Pearson.
  2. O’Neill, R. E., Horner, R. H., Albin, R. W., Sprague, J. R., Storey, K., & Newton, J. S. (1997). Functional assessment and program development for problem behavior: A practical handbook (2nd ed.). Pacific Grove, CA: Brooks/Cole.
  3. Thompson, T., Felce, D., & Symons, F. J. (Eds.). (2000). Behavioral observation: Technology and applications in developmental disabilities. Baltimore, MD: Paul H. Brookes.
  4. Watson, T. S., & Steege, M. W. (2003). Conducting school-based functional behavioral assessments: A practitioner’s guide. New York: Guilford Press.

See also: