Counseling process refers to events, characteristics, or conditions that occur during or as a result of the interaction between counselor and client. The therapeutic relationship that develops during counseling sessions is an example of counseling process; completing homework outside of session also constitutes an event that fits within counseling process. Process can refer to what the counselor does with the client as well as how change occurs within the client. In contrast, counseling outcome refers to the results or effects of counseling. Outcomes are those phenomena that change in the client as a direct or indirect result of counseling. Presumably, process influences outcome, although research has been unsuccessful at demonstrating consistent links between measures of process and measures of outcomes.
Measurement of counseling process and outcome has been one of the most vexing tasks in the history of counseling and psychotherapy research. One of the key problems in process and outcome measurement is that while hundreds of process and outcome measures have been created, no consensus exists about what measures or how these measures should be employed in practice or research. Researchers refer to this as the units of measurement issue: With any particular client, or in any particular study, what should be measured? Process researchers have investigated such variables as the amount of talking by counselors or clients, counselor competence, therapist adherence to a role or treatment manual, client experiencing of affect, the strength of the therapeutic bond, client defensiveness or resistance, severity of client problem(s), and language use. As researchers have noted, process clearly signifies many different things.
The remainder of this entry discusses how psychological tests are typically evaluated, describes and evaluates a representative sample of counseling process and outcome measures, and then discusses future directions for research about process and outcome measures.
General Measurement Principles
Process and outcome measures can be evaluated using traditional standards of reliability and validity. Reliability refers to the consistency of measurement and is frequently evaluated through statistical tests of internal consistency, test-retest reliability, and inter-rater reliability. Coefficient alpha, the statistic usually employed to assess internal consistency, indicates the extent to which scores on individual test items contribute consistently to the total score. Test-retest reliability refers to the consistency of measurement over time; if the test measures a stable psychological trait, it should yield consistent scores across repeated administrations. Interrater reliability refers to the ability of two or more judges to assess some psychological characteristic or event similarly. The intraclass correlation and kappa statistic are used to indicate the degree of interrater reliability.
Validity has been described both as (a) the extent to which a test measures the construct it is intended to measure and (b) the extent to which evidence exists that test scores can be employed for a particular purpose. The first definition focuses on whether scores on a particular test reflect a particular construct (e.g., depression) as opposed to other constructs (e.g., stress) or systematic sources of error. Social desirability is a frequently noticed source of error for many tests and refers to the tendency to present one’s self in a favorable light. In a counseling setting, for example, a new client might underreport such negative behaviors as smoking, drinking, or unsafe sexual behaviors.
Regarding the second definition of validity, organizations such as the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME) emphasize the purpose of testing as a critical factor. Given that the primary purpose of an outcome measure is to assess change, an instrument’s sensitivity to change is directly related to its construct validity.
Validity can be evaluated using numerous methods. For example, researchers often examine the correlations between a test of a particular construct and a second test of the same or a similar construct (i.e., an evaluation of convergent validity). Correlations between a test of a construct and a dissimilar construct represent an evaluation of discriminant validity. In addition, test developers frequently create tests by administering a large pool of items to a large group of respondents and then subjecting the resulting scores to a factor analysis. In factor analysis, test developers assume that any large collection of items or tests actually measures a smaller number of more basic factors or traits; these factors consist of a group of highly intercorrelated variables. Factor analysis refers to a set of statistical procedures used to examine the relations among items or tests and produce an estimate of the smaller number of factors that accounts for those relations.
Counseling Process Measures
This section reviews a representative sample of five counseling process measures. The description of each measure includes an overview of the test, brief information about reliability and validity findings, a counseling-related research result found with this measure, and at least one potential problem or issue with the measure.
Five Process Measures
- Working Alliance Inventory (WAI). This 36-item scale includes parallel forms that are completed by client, counselor, and observer; a short-form of 12 items has also been recently developed. The WAI measures aspects of the working alliance that Bordin suggested would grow out of an agreement between client and therapist on counseling goals and tasks. The WAI produces scores on 3 subscales of 12 items each (tasks, goals, and the therapeutic bond) and a total score. Sample items include, “I believe (my counselor) likes me” and “(My counselor) and I are working toward mutually agreed-upon goals.” More than 100 studies employing the WAI have appeared in the literature, with many confirming that WAI scores evidence high internal consistency and at least modestly predict therapy outcome. Some studies have raised questions about the constructs the WAI measures, how to explain the high correlations among the three subscales, and which source (client, counselor, observer) better predicts outcome. Research also indicates that a better working alliance is associated with a client’s willingness to express negative affect, a potentially important component of a successful counseling process.
- Session Evaluation Questionnaire (SEQ). The SEQ was developed to measure clients’ perceptions of a counseling session’s impact on them in terms of post-session mood and immediate effects. Twenty-four items, rated on a 7-point semantic differential scale, produce a total score and four subscale scores (Depth, Smoothness, Positivity, and Arousal). With a semantic differential scale, the two extremes of the construct are presented with the numbers between them (e.g., Safe-1 2 3 4 5 6 7 Dangerous, or Quiet-1 2 3 4 5 6 7 Aroused). Raters choose the number that best expresses their feeling. Depth items refer to the client’s in-session perception of the session’s value and impact, while Smoothness items assess in-session perceptions of safety and distress. Positivity measures client’s postsession confidence and happiness, and Arousal refers to postsession level of activity and excitement. Research indicates that the in-session Depth and Arousal measures correlate highly with the postsession Positivity (but not Arousal) scale. Like all process measures, questions remain about the SEQ’s relation to outcome, that is, to what extent does a single counseling session affect counseling outcome?
- Expectations About Counseling-Brief Form (EAC-B). Clients’ expectations about counseling form a potentially important explanation for why clients seek counseling and what they expect to occur. The Expectations About Counseling-Brief Form (EAC-B) has been the most widely employed measure in this area for research purposes. The EAC-B contains items answered on a 7-point Likert scale that ranges from not true to definitely true. Responses to the 66 items are employed to produce scores on 17 scales that are grouped in four general areas of Client Attitudes and Behaviors, Counselor Attitudes and Behaviors, Counselor Characteristics, and Counseling Process and Outcome. Studies of the internal consistency and test-retest reliability of the subscales found support for most of the scales. Construct validity studies have demonstrated that the scale assesses expectations about counseling as opposed to perceptions about counseling. The scale has also been found to be useful with Hispanic and rural samples. Although one of the technically most sophisticated of the process measures, little published research has appeared with the EAC-B since the early 1990s.
- Client Reactions System (CRS). Hill and colleagues developed the CRS, a measure that contains 21 categories of reactions (14 positive and 7 negative categories) that clients experience in response to counselor interventions. Examples of the positive categories include Feelings, where the client felt more deeply; Supported, where the client feels that the counselor liked or cared for the client; Relieved, with decreased feelings of anxiety or depression; and Responsibility, where the client blamed others less and took more responsibility for events. In negative categories, Misunderstood refers to situations where the client believes that the counselor did not listen accurately, judged the client, or made false assumptions, and Scared, where the client feels overwhelmed or afraid to admit some problem. To employ the CRS, clients watch a videotape of their session, stopping the tape after each counselor intervention and then rating their reaction. Although researchers have raised questions about the reliability and validity of similar measures, relatively few studies have been completed with the CRS.
- Counseling Self-Estimate Inventory (COSE). Bandura’s self-efficacy theory forms the theoretical basis for more recent process-oriented measures. The theory’s basic idea is that expectations of personal competence strongly influence the initiation and persistence of behavior. To create the COSE, Larson and colleagues administered the Likert-format items to 213 students enrolled in master’s level counseling courses and performed a factor analysis on the resulting responses. Five factors resulted from the analysis. Microskills items refer to course content related to basic counseling and communication skills training. Process items describe an integration of counselor responses when working with a client. The sum of the Difficult Client Behaviors items indicates high self-efficacy for dealing with silent or unmotivated clients. Cultural Competence items refer to behaving competently with clients of different ethnic or cultural groups. Awareness of Values items assess counselors’ tendency to impose their values and biases on the client.
COSE scales have demonstrated moderate to high internal consistency and test-retest reliability, and they correlate moderately in expected directions with measures of self-concept, anxiety, problem-solving appraisal, and counseling performance. A significant problem with the Process and Microskills subscales is that the content of each scale is confounded with positive or negative item wording.
Additional Process Measures
Other process scales more frequently employed in the counseling literature include the Expectations about Counseling scale, Counselor Rating Form, Counselor Effectiveness Scale, Barrett-Lennard Relationship Inventory, Counselor Effectiveness Rating Scale, Personal Attributes Inventory, Counselor Evaluation Inventory, and Structural Analysis of Social Behavior. These measures tap into just a few of the many process constructs noted above.
Counseling Outcome Measures
While many process measures are employed primarily in research studies, both researchers and practicing counselors use outcome measures. Counseling researchers use outcome measures to conduct efficacy and effectiveness studies. Efficacy studies are studies typically conducted in experimental settings (such as university clinics) in which a particular therapeutic approach (e.g., cognitive-behavioral therapy) is employed with counseling and control groups that are relatively homogeneous (e.g., all clients are depressed). With efficacy studies, researchers choose measures focused on the particular targets of counseling, such as depression or anxiety. In contrast, effectiveness studies occur in field settings, such as community mental health settings, where a diverse group of clients (in terms of age, gender, and race) presents with a wide range of psychological problems.
In practice settings, managed care companies and other funding sources often require counselors to employ outcome measures to offer evidence that counseling services are effective. Community mental health agencies, hospitals, and schools employ outcome measures to demonstrate that their services provide benefits to clients and their families. Outcome measures in effectiveness studies and in most practice settings are more likely to be comprehensive measures that assess a broad range of problem domains.
A corresponding set of problems to those found with process measures occurs with counseling outcome measures. For example, outcome measures differ in terms of their
- Content, such as a focus on intrapsychic (e.g., feelings of anxiety and depression), interpersonal (e.g., conflict with others), and social role (e.g., student or employee) functioning
- Source of information, including the client, counselor, significant others, trained observers, expert judges, and societal agents (e.g., teachers)
- Methods, including self-reports, ratings by others, behavioral observations, physiological measures, and projective methods
As with process measures, no agreement exists among researchers or counselors about which combinations of content, source, or method should be employed in any particular situation. What is known is that each of these elements can influence the results of outcome investigations. Researchers have found that the degree of apparent change in counseling can be significantly influenced by the assessment method as well as the specific measure chosen. These findings include:
- Counselor and expert ratings produce larger estimates of change than client self-reports.
- Global ratings (e.g., an overall assessment of client functioning) typically show larger effects than assessments of specific symptoms.
- Measures based on specific short-term targets of therapy (e.g., reduction in alcohol consumption) produce greater effects than more general long-range targets (such as future mental health status).
- Measures of negative affective states (e.g., depression and anxiety) show larger, more immediate effects from counseling than measures of interpersonal conflict or social role functioning.
Four Outcome Measures
Many well-known scales employed for outcome measurement with children, adolescents, and adults, such as the Child Behavior Checklist, Conners Rating Scales, and the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), are lengthy measures initially designed to perform diagnostic and screening purposes. Many of these measures are too long, are too expensive, and contain too many items that are insensitive to change to be realistic outcome measures in practice settings. Research participants may be willing to spend an hour or more completing a battery of measures. However, actual clients whose primary motivation is to obtain relief from a troublesome condition typically will not complete tests that are more than 1 page in length, tasks that require more than 5 minutes to complete, or activities that appear to be unrelated to the actual provision of counseling services.
This section provides four examples of outcome measures: Two are specific to depression and anxiety, while the remaining two measures represent examples of a global outcome measure and a comprehensive outcome measure.
- Beck Depression Inventory (BDI). Developed on the basis of observations of depressed and nondepressed individuals, the BDI contains 21 multiple-choice items that assess 21 different aspects of depression. Completed by clients, each item contains four statements that range from no indication of depression to severe. A sample item might ask respondents to choose among statements ranging from “I do not feel sad” to “I am so sad or unhappy that I can’t stand it.” Depression symptoms and attitudes include mood, guilt feelings, suicidal wishes, irritability, sleep disturbance, and loss of appetite.
Research suggests that scores on the BDI evidence high internal consistency, correlate highly with other measures of depression, and are sensitive to change resulting from a variety of medication and counseling interventions. On the other hand, some research indicates that more than 50% of individuals classified as depressed by the BDI change categories when retested, even when the retesting period consists of only a few hours or days. The instability of some BDI items raises questions about whether changes in BDI scores reflect improvements resulting from counseling or a methodological influence.
- State-Trait Anxiety Inventory (STAI). The STAI consists of two 20-item self-report scales that are based on a conceptual understanding of anxiety as a signal to an individual of the presence of danger. The State Anxiety scale assesses more transient emotional states that can vary by situation, while the Trait Anxiety scale focuses on more stable aspects of anxiety. Sample items include “I feel content” and “I feel nervous.” Research evaluations of the two scales show expected differences; the State Anxiety scale, for example, evidences lower test-retest reliability. Both the State and Trait Anxiety scales evidence high internal consistency across most samples. The State Anxiety scale has been shown to be treatment sensitive, particularly in detecting the effects of counseling interventions aimed at decreasing test anxiety. Although the STAI was developed with the intent to produce scores that better reflect anxiety than depression, most measures of anxiety and depression correlate moderately to highly. This finding has not been explained satisfactorily, and it raises questions about the validity of measures of anxiety and depression.
- Global Assessment of Functioning (GAF). Over the past several decades, the GAF has been the most widely used brief outcome measure. Its popularity is attributable in part to the fact that it consists of a single-item 100-point rating scale that clinicians complete to estimate a client’s overall functioning and symptomatology. The global rating is intended to summarize symptoms and functioning across diverse domains from work functioning to suicide potential over daily, weekly, or monthly periods. GAF ratings employed for outcome assessment are typically completed at intake and termination. However, managed care companies often require counselors to report such ratings for individual clients over more frequent time intervals while counseling is ongoing.
Researchers have reported modest test-retest reliability values in the .60 to .80 range. Despite its widespread use, little additional psychometric data are available. The most fundamental problem with the GAF is its transparency: The counselor can easily manipulate the rating, making the client appear as distressed as necessary to justify treatment, but at a potential cost of validity. A recent survey found that counselors considered global GAF-type data to be among the least useful information for outcome assessment.
- Outcome Questionnaire (OQ-45). One of the newest comprehensive outcome scales is the Outcome Questionnaire-45 or OQ-45. Intended for persons 18 and older, the 45-item test can be completed in about 5 minutes. The OQ produces a total score and three subscale scores. The Symptom Distress subscale contains items related to anxiety and depression (e.g., “I feel blue”). Interpersonal Relations items assess satisfaction with and problems in interpersonal functioning (e.g., “I am satisfied with my relationships with others”). Social Role Performance items relate to satisfaction and competence in employment, family, and leisure roles (e.g., “I find my work/school satisfying”).
Studies indicate that the OQ-45 has adequate test-retest reliability and internal consistency. Scores on the OQ correlate in expected directions and magnitudes with scores on related scales such as the SCL-90-R, Beck Depression Inventory, State-Trait Anxiety Inventory, and Inventory of Interpersonal Problems. Furthermore, OQ-45 scores can distinguish between clinical and nonclinical groups. A source of concern is that research with college students found that students show improvement on OQ items even when they are not in counseling, although not at the same rate as treated individuals. An additional issue is that all three subscales of the OQ-45 are highly intercorrelated, suggesting that the total score on the scale provides an indication of general distress. Nevertheless, some of the most interesting outcome research now being conducted has demonstrated that using OQ scores to provide ongoing feedback to counselors about client progress reduces the rate of client treatment failures.
Traditionally, test developers have sought items that discriminate among individuals on relatively stable traits (e.g., intelligence or vocational interests). A more challenging task is to measure change. Test developers are now at work developing a second generation of tests to measure changes that occur as a direct or indirect result of counseling. Empirical studies of methodologies that can be used to evaluate and enhance change-sensitive measures have only recently begun to appear in the professional psychological literature. A major assumption guiding these approaches is that different test construction and item analysis procedures are necessary to select items that reflect counseling effects. These item-analysis procedures test the competing claims that an observable change at the item level results from counseling, or that change results from factors that are unrelated to counseling. These efforts may enable researchers and counselors to eventually obtain better answers to such important questions as “What changes in counseling?” and “What is the nature of the process that produces beneficial change in counseling?”
- Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84, 191-215.
- Bordin, E. S. (1979). The generalizability of the psychoanalytic concept of the working alliance. Psychotherapy: Theory, Research, and Practice, 16, 252-260.
- Hill, C. E., & Lambert, M. J. (2004). Methodological issues in studying psychotherapy processes and outcome. In M. J. Lambert (Ed.), Bergin and Garfield’s handbook of psychotherapy and behavior change (5th ed., pp. 84-135). New York: Wiley.
- Hoffman, B., & Meier, S. T. (2001). An individualized approach to managed mental health care in colleges and universities: A case study. Journal of College Student Psychotherapy, 15, 49-64.
- Horvath, A. O., & Greenberg, L. S. (1989). Development and validation of the Working Alliance Inventory. Journal of Counseling Psychology, 36, 223-233.
- Kendall, P. C., Hollon, S. D., Beck, A. T., Hammen, C. L., & Ingram, R. E. (1987). Issues and recommendations regarding use of the Beck Depression Inventory. Cognitive Therapy and Research, 3, 289-299.
- Martin, D. J., Garske, J. P., & Davis, M. K. (2000). Relation of the therapeutic alliance with outcome and other variables: A meta-analytic review. Journal of Consulting and Clinical Psychology, 68, 438-150.
- Maruish, M. E. (1999). The use of psychological testing for treatment planning and outcome assessment (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.
- Meier, S. T. (2004). Improving design sensitivity through intervention-sensitive measures. American Journal of Evaluation, 25, 321-334.
- Orlinsky, D. E., Ronnestad, M. H., & Willutzki, U. (2004). Fifty years of psychotherapy process-outcome research: Continuity and change. In M. J. Lambert (Ed.), Bergin and Garfield’s handbook of psychotherapy and behavior change (5th ed., pp. 307-389). New York: Wiley.
- Spielberger, C. D., & Sydeman, S. (1994). State-Trait Anxiety Inventory and State-Trait Anger Expression Inventory. In M. Maruish (Ed.), The use of psychological testing for treatment planning and outcome assessment (pp. 292-321). Hillsdale, NJ: Lawrence Erlbaum.
- Stiles, W. B., Honos-Webb, L., & Knobloch, L. M. (1999). Treatment process research methods. In P. C. Kendall, J. N. Butcher, & G. N. Holmbeck (Eds.), Handbook of research methods in clinical psychology (2nd ed., pp. 364-402). New York: Wiley.
- Stiles, W. B., & Snow, J. S. (1984). Counseling session impact as viewed by novice counselors and their clients. Journal of Counseling Psychology, 31, 3-12.
- Vermeersch, D. A., Whipple, J. L., Lambert, M. J., Hawkins, E. J., Burchfield, C. M., & Okiishi, J. C. (2004). Outcome Questionnaire: Is it sensitive to changes in counseling center clients? Journal of Counseling Psychology, 51, 38-49.