Cognition/Intelligence Assessment

The assessment of intelligence has a long and colorful history, and its development mirrors the development of psychology as a field. From the early work of Francis Galton and James McKean Cattell to the seminal contributions of Charles Spearman and David Wechsler to the contemporary work of Alan and Nadeen Kaufman, Jack Naglieri, and many others, the assessment of intelligence has facilitated the growth of scientific and clinical psychology. The purpose of this article is not to provide a comprehensive review of current and promising instruments but rather to highlight important approaches to and key issues involving the assessment of intelligence.


There are numerous ways to classify the diverse range of intellectual assessments, but this article uses two basic categories to facilitate the treatment of the topic: classic assessments (those based on classic theoretical and conceptual approaches to intelligence) and contemporary approaches (those based on more recent theoretical approaches).

Classic Assessments

Stanford-Binet Intelligence Scale

The goal of the French psychologists Alfred Binet and Theodore Simon was to examine the mental abilities of children in comparison with their average-level achieving peers. In the first couple of decades of the 20th century, Binet and Simon defined intelligence as a fundamental faculty that is of the utmost importance for practical life. They referred to this faculty variously as judgment, good sense, practical sense, initiative, and the ability to adapt one’s self to circumstances. A person who lacks judgment may be a moron or an imbecile, but a person who has good judgment cannot be either. Indeed, Binet and Simon believed that the rest of the intellectual faculties were of little importance in comparison with judgment. The Binet-Simon scale required children to complete a series of mental activities until the items consistently became too difficult for them to answer correctly. The tests were normed to allow an estimate of “mental age” for each student based on the types of items they were able to answer correctly.

Lewis Terman revised and extended the Binet-Simon scale in 1912, standardized it using a large American sample, and renamed it the Stanford-Binet Intelligence Scale. Terman defined intelligence as the ability to engage in abstract thinking. Terman was well known for his work in assessing children to predict whether they would be successful later in life. He found that children who were gifted tended to be healthier, taller, better developed physically and advanced in leadership and social adaptability compared to their average-level peers.

The Stanford-Binet Scale, which was revised for its fifth edition (SB5) by Gale Roid, can be used across the lifespan. It has been normed with 4,800 participants including 1,400 children ages 2 to 5. The Stanford-Binet assesses two Domains (verbal and nonverbal reasoning) and five Factors (fluid reasoning, visual-spatial processing, quantitative reasoning, working memory, and knowledge). Each of the Factors is assessed through different subtests in each of the Domains. For example, there are verbal and nonverbal subtests for fluid reasoning. The Stanford-Binet provides a number of objects such as blocks and colorful toys that must be manipulated to increase the attention and enjoyment of young children. The assessment takes about 5 minutes per subtest to administer.

Wechsler Scales

Working primarily in the mid- to late 20th century, David Wechsler viewed intelligence as an individual’s global capacity to act purposefully, think rationally, and deal effectively with the environment. Wechsler also believed that intelligence is affected by nonintellective factors such as personality. The Wechsler Intelligence Scale for Children (WISC-IV), now in its fourth edition, has been normed with 2,200 children who were matched closely to the diversity of the 2000 United States Census. The WISC-IV can be used to assess children from the age of 6 to 16 years, 11 months. It yields a full-scale IQ score and four index scores: Verbal Comprehension, which includes similarities, vocabulary, and comprehension activities; Perceptual Reasoning, which includes matrix reasoning, block design, and picture concepts; Working Memory, which includes letter-number sequencing and digit-span; and Processing Speed, which includes symbol search and coding. The subtests have not been arranged in order of importance, but instead are compiled to gauge general mental ability.

Wechsler also designed the Wechsler Preschool and Primary Scale of Intelligence, now in its third edition (WPPSI-III). The present version has been normed with 1,700 children and can be used with children ages 2 years, 6 months to 7 years, 3 months. Children under 4 years of age are administered a shorter assessment of four subtests (e.g. receptive vocabulary, information, block design, and object assembly), which measure perceptual organization and verbal comprehension. Older children are administered seven subtests (e.g., information, vocabulary, word reasoning, block design, matrix reasoning, picture concepts, and coding) to measure perceptual organization, verbal comprehension, and processing speed abilities. In its latest revision, the WPPSI-III was revised with the goal of being more enjoyable for young children and better able to sustain their attention, and to remove any ethnic, gender, regional, or socioeconomic bias.

The Wechsler Adult Intelligence Scale, currently in its third edition (WAIS-III), is administered to those 16 years old or older. The latest edition was normed in the United States during the early 1990s with a group of 2,450 people deemed to be representative of the adult population. The WAIS-III is divided into Verbal and Performance Scales. The six standard subtests found in the Verbal Scale are Information, Digit Span, Arithmetic, Vocabulary, Similarities, and Comprehension. The five standard subtests found in the Performance Scale are Block Design, Picture Arrangement, Matrix Reasoning, Picture Completion, and Digit Symbol-Coding.

Woodcock-Johnson Test of Cognitive Abilities

The Woodcock-Johnson III Test of Cognitive Abilities (WJ III) offers a different perspective of cognitive assessment. The WJ III is one of the only assessments based on the Cattell-Horn-Carroll (CHC) theory of cognitive abilities. CHC theory views intelligence in a three-stratum hierarchy. The first stratum is composed of 69 narrow cognitive abilities, including memory, fluency, and coding. The second stratum is composed of seven clusters of cognitive ability, including short-term memory, processing speed, fluid reasoning, auditory processing, visual-spatial thinking, long-term retrieval, and comprehension-knowledge. The third stratum, known as General Intellectual Ability, represents a combination of all cognitive abilities. The WJ III can also be used to assess working memory and executive functioning.

The WJ III has been normed with 8,818 people in the United States and it can be administered to anyone from 2 to over 90 years old. This assessment has been normed along with the Woodcock-Johnson III Tests of Achievement, which together form the complete battery that practitioners will often administer together. The assessment is computer scored, and its results can be reported in terms of standard scores, percentile ranks, age and grade equivalent scores, and general intellectual ability. There are 10 subtests in the Standard Battery, including Verbal Comprehension, Visual-Auditory Learning, Spatial Relations, Sound Blending, Concept Formation, Visual Matching, Numbers Reversed, Incomplete Words, Auditory Working Memory, and Visual-Auditory Learning-Delayed. There are also 10 subtests on the Extended Battery. All 20 subtests are not needed at one time, but the assessment is designed to combine selected subtests to obtain the most pertinent and suitable information. The manual offers various ways to combine subtests to discern the exact information the psychologist is seeking. This assessment usually takes between 40 minutes and 2 hours to administer.

Contemporary Approaches

Kaufman Batteries

Alan and Nadeen Kaufman created two major assessments of intelligence, both of which are innovative due to their strong theoretical foundations. The Kaufman Assessment Battery for Children, Second Edition (KABC-II) has been normed with 3,025 children, and it is administered to children ages 3 years to 18 years, 11 months. The KABC-II is based on two theories of intelligence. First, the CHC model for assessment is based on a theoretical approach to intelligence that distinguishes between fluid and crystallized abilities. In the KABC-II, the CHC model is used more often because it is designed for children who speak English as a first language and would, therefore, be less disadvantaged by tests of language abilities and word knowledge. The second model, based on the neuropsychological work of Russian scientist A. R. Luria, de-emphasizes verbal processes by not including the assessments of language ability or word knowledge. This makes the Luria model more accessible to children who do not speak English as a first language, or who have an expressive or receptive language disorder.

The KABC-II was revised to provide an assessment that is more impartial when working with diverse children from various backgrounds, which resulted in less significant score variations among ethnic groups. This assessment has been standardized, adapted, and translated in more than 15 countries, and administration time typically ranges from 30 to 60 minutes.

The Kaufman Adolescent and Adult Intelligence Test (KAIT) has been normed with 2,000 people in the United States, and it is appropriate for people over the age of 11. The norm sample considered such characteristics as gender, ethnicity, examinee or parental education, and geographic region. The KAIT contains 11 subtests that compose three scales, the Crystallized, Fluid, and Delayed Recall scales. Administration takes approximately 1 hour.

Cognitive Assessment System

  1. P. Das, Jack Naglieri, and John Kirby proposed a planning-attention-simultaneous-successive (PASS) model of human intelligence that is partly based on Luria’s neuropsychological research. Based on PASS theory, Das and Naglieri developed the Cognitive Assessment System (CAS). The CAS assesses four aspects of basic cognitive functioning. Planning assesses cognitive control, setting goals, knowledge, and the effectiveness of one’s planning strategies. Attention assesses the ability to focus on certain stimuli while ignoring others. Simultaneous processing assesses the ability to perceive stimuli as a whole. Successive processing assesses the ability to remember certain phrases and use them to better understand concepts.

The Cognitive Assessment System was standardized in 1990 on 2,200 children from diverse backgrounds. The composition of the norm group mirrors the general population in terms of gender, race, age, region, community setting, ability level, classroom placement, and parental education. The CAS can be used with children aged 5 years to 17 years, 11 months.

Differential Ability Scales (DAS)

Collin Elliot has avoided the term intelligence and did not use it in the Differential Ability Scales (DAS-II), because of his belief that there are multiple definitions and considerable misunderstanding surrounding the term. The DAS was designed to assess cognitive strengths, weaknesses, and other abilities of children. It has been normed with 3,475 children and adolescents living in the United States, who were stratified on sex, ethnicity, age, parental educational level, region, and preschool enrollment. It is appropriate for children ages 2 years, 6 months to 17 years. The DAS was designed to reduce examinee frustration and testing time because of specific beginning and end points used for different age levels.

Two different batteries are available for administration to children: the Preschool Level is suitable for ages 2 years, 6 months to 5 years, 11 months, and the School-Age Level is intended for ages 6 years to 17 years, 11 months. The 17 subtests are divided into 12 core and 5 diagnostic subtests. The core subtests are used to compute the General Conceptual Ability composite, which assesses an individual’s ability to perform complex mental tasks. The subtests are also used to compute the Verbal Ability, Nonverbal Ability, Nonverbal Reasoning Ability, and Spatial Ability scores.

Nonverbal Tests

The creation and use of nonverbal tests, such as the Raven’s Matrices, has played a role in intelligence testing for decades. Nevertheless, increased concern about the confounding of language with intelligence testing has led to the development of additional nonverbal approaches to intelligence testing. Nonverbal assessments now exist for use with gifted and talented students and those who do not speak English as a first language or who have special needs.

Researchers have offered several cautions about the use of nonverbal tests. First, nonverbal assessments should be used after other assessments have first been administered. Second, although the goal is for nonverbal assessments to be helpful in assessing non-English speakers, it is difficult to find an assessment that does not have some sort of cultural bias in it. Third, the assessment procedures are nonverbal, but the cognitive processes a person uses to answer the questions may involve drawing on vocabulary words or mathematical skills. Fourth, if a nonverbal assessment is administered orally, it can no longer be considered nonverbal, because verbal skills are required to comprehend the directions. As a result, many psychologists prefer to use nonverbal assessments that involve directions that are pantomimed. For all of the reasons listed above, some psychologists prefer the use of standard, verbal assessments, such as the CAS or KABC-II, that hold promise for reducing demographic differences, but this topic remains controversial.

Raven’s Progressive Matrices

The Raven’s Progressive Matrices (1998) is a form of nonverbal assessment that was originally designed in 1938 to assess an individual’s ability to view abstract figural stimuli, reason using analogies, and draw conclusions. The Coloured Progressive Matrices is designed for children ages 5 to 11 years. The most widely used version, the Standard Progressive Matrices, is designed for those age 6 to 17 years old, but it can be used with adults also. The Advanced Progressive Matrices is designed for older adolescents and adults who have a higher intellectual ability. All three versions were constructed based on Spearman’s unitary theory of intelligence.

Comprehensive Test of Nonverbal Intelligence

The Comprehensive Test of Nonverbal Intelligence (CTONI) is a commonly used nonverbal assessment that consists of six subtests designed for those ages 6 to 89 years old. It has been standardized using a group of 2,901 people from 30 different states. Included in this sample were people of diverse gender, ethnicity, age, and geographic region. The sample also included some students with disabilities who were enrolled in general education courses. The instrument measures abstract thinking abilities, problem solving, and reasoning, using a series of visual problems that need to be solved using analogical thinking, sequencing, or categorization. It requires 40 to 60 minutes to administer and can be administered orally or using pantomime.

Naglieri Nonverbal Ability Test

The Naglieri Nonverbal Ability Test (NNAT) is designed for administration to children in kindergarten through Grade 12 and it has been normed on more than 100,000 students. The NNAT is useful for any students who may benefit from a nonverbal assessment, including students who are gifted and talented, learning disabled, or hearing impaired. The assessment is administered by showing diagrams of progressive matrices, so no verbal instructions are needed. This helps to ensure that the assessment is completely nonverbal to decrease the chance of any confounding variables. The NNAT takes about 30 minutes to administer and it can be administered in a group setting.

Universal Nonverbal Intelligence Test

The Universal Nonverbal Intelligence Test (UNIT) is suitable for administration to children ages 5 to 17. It was standardized with a sample of 2,100 children from across the United States, and on an additional 1,765 children who were used to test its validity and reliability. The UNIT is administered by using eight language-free gestures (i.e., thumbs up, hand waving, stop, head nodding, head shaking, open-hand shrugging, palm rolling, and pointing). The UNIT consists of six subtests that assess Analogic Reasoning, Spatial Memory, Cube Design, Object Memory, Symbolic Memory, and Mazes. The UNIT, similar to other nonverbal assessments, includes matrices, but it also includes items that require the manipulation of objects and the use of gesturing, and pencil and paper.

Key Issues in Intellectual Assessment

Flynn Effect

In the early 1980s, the sociologist James Flynn noted that intelligence test scores show a gradual rise in average performance over the previous several decades. This phenomenon, which has become known as the Flynn effect, has been observed in almost every country and on almost every assessment (although the effect appears to be pronounced on nonverbal assessments). The increase is small (roughly 1-2 IQ points a generation) but appreciable over extended periods of time. Causes of the Flynn effect are widely debated and are beyond the scope of this article, but the effect’s primary implication for psychologists is the need to ensure that scores on intelligence tests are calculated and compared to updated norms. Flynn has documented several cases in which scores have been com-pared to outdated norms, leading to conclusions of superior IQ or large IQ gains when current norms would have led to more moderate conclusions.


A cutoff score on an intelligence or, occasionally, an achievement test has traditionally been used to identify intellectual giftedness. For example, a school district might define a gifted student as one who scores in the top 3% to 5% on a specific intelligence assessment (e.g., the WISC-IV, SB5, or WJ III). However, in many countries a broadened definition of giftedness has emerged over the past quarter century. This change has been stimulated, in part, by recognition that giftedness can be multifaceted, that traditional intellectual assessments are limited in their ability to identify intellectual talent, and by the under-representation of minorities in gifted programs (i.e., the belief that current tests are biased against minority students). This has led to the development of nontraditional assessments of giftedness, including teacher and parent ratings, performance-based assessments, and peer nominations. Increasingly, the nonverbal assessments described above are also used in gifted identification systems. The impact of these alternatives appears to be limited, but a great deal of research and evaluation on the development and use of alternative assessments is needed.


The past decade has seen an explosive growth in research on creativity because of a growing awareness of the relationship of creativity to psychological well-being and problem-solving ability. Most conceptualizations of creativity include both cognitive and noncognitive factors, so cognitive assessment provides information on an important but limited aspect of creativity. Divergent thinking (DT) is the most commonly assessed cognitive aspect of creativity. Historically, the Torrance Tests of Creative Thinking (TTCT) have been the most widely used divergent thinking tests. The TTCT has two versions, Verbal and Figural, and two forms (A and B) of each version. Scores can be calculated for Fluency (number of responses), Originality (statistical infrequency of the responses), Flexibility (extent to which responses appear to be from different categories), and Elaboration (extent to which responses go beyond the typical answer). The Consensual Assessment Technique, in which students produce products that can be evaluated for creativity, is a popular alternative to divergent thinking tests, but it has been the subject of limited study in applied settings.

Major Developments

The assessments reviewed in this article represent the main themes in cognitive assessment over the past 50 to 60 years, but there are hundreds of additional cognitive assessments available to counseling psychologists. The major developments over the past century have been the reliance on stronger theoretical models and research in the development of tests, the acknowledgment of the role of language (as both an advantage and disadvantage to certain test takers), and the realization of the need to restandardize tests on a routine basis to account for the Flynn effect.


  1. Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytical studies. New York: Cambridge University Press.
  2. Fancher, R. E. (Ed.). (1985). The intelligence man: Makers of the IQ controversy. New York: Norton.
  3. Flynn, J. R. (2006). Tethering the elephant: Capital cases, IQ, and the Flynn Effect. Psychology, Public Policy, and Law, 12, 170-189.
  4. Human Intelligence:
  5. Neisser, U., et al. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77-101.
  6. Sattler, J. M. (1990). Assessment of children (3rd ed., rev.). San Diego, CA: Sattler.

See also: