The topic of research methods in developmental psychology encompasses an array of methodological and statistical issues that arise when attempting to study development. or change in behavior as a function of time. To organize ideas about research methods. it is useful to distinguish among three domains—the design of developmental research. measurement issues that are of particular relevance in developmental work, and the statistical models and methods that characterize research efforts in the field.

## Developmental Research Designs

The topic of developmental research designs has been broached many times during the past 75 years. As Wohlwill (1973) argued, the most basic aim of developmental science is to study change in behavior (B) as a function of time {T), or B = J(T). Hence, developmental research designs should promote the modeling of change in behavior across time. Time can, however, be measured in many ways (Schroots & Birren, 1990), and different ways of indexing time have important implications for representing and understanding behavioral change. Because researchers are typically interested in the ontogenetic development of behaviors, the most common index of time is chronological age, or time since birth. Under this approach, the goal of developmental psychology is the determination of the relationship between a behavior of interest and the chronological age of participants, often symbolized as B = f A), reflecting the assumption that behavior (B) is a specifiable function of age (A). But, Schroots and Birren offered many other indicators of psychological age or time that are related to chronological age but that may govern, or at least better track, developmental change, so chronological age should be considered only an approximation of the optimal time dimension along which behavioral development should be charted.

One option that must be faced when designing a developmental study is whether the same individuals or different individuals will be measured at the multiple ages. Most researchers recognize the benefits of assessing the same individuals at the several times of measurement, as this allows the direct determination of age changes. or the age-related change in a given behavior by each individual (Baltes & Nesselroade. 1979). Of course, this approach can slow the progress of research if the aim of the investigation is to portray behavioral change across a considerable age span. To tackle this issue, Bell (1953) presented a method of approximating long-term age changes by means of shorter term study of several samples. This could be accomplished by assessing multiple groups of subjects belonging to different birth cohorts across more restricted age spans and then organizing the partially overlapping trends as a function of chronological age. This notion was formalized by Schaie (1965) as a general developmental model that recognized the potential influences on behavior of the chronological age (A) and birth cohort (C) of the individual as well as the historical moment or period (P) at which measurements are taken. The resulting conception was organized around the potential effects on behavior of age, period, and cohort, signified as B = fiA. P. C), and the interpretation of these effects on behavior, as will be discussed below.

Clear distinctions among three simple developmental designs are possible, based on considerations of age, period, and cohort effects. The most commonly used simple developmental design is the cross-sectional design, in which all measurements are obtained at a single time or period of measurement. Two or more samples of participants who differ in chronological age are obtained, and empirical results are arrayed as a function of the chronological age of the samples of participants. But, year of birth, or birth cohort, is perfectly correlated with, and therefore perfectly confounded with, chronological age in a cross-sectional design, so cohort effects are viable alternative explanations for any age-related trends in data.

Furthermore, because the performance of different samples is compared, cross-sectional designs can provide, at best, information on age-related differences, or age differences, as opposed to assessing directly changes with age. Several assumptions must be met in order to have confidence that age differences from a cross-sectional design represent trends that would likely result from individuals changing or developing across the age span of the study. Chief among these is the assumption that comparable sampling of participants was conducted for each of the samples. Even unintended differences in sampling may distort trends, yielding mean aging trends that no individual person would exhibit. For example, consider drawing random samples of students in school in grades 6, 8, 10, and 12. Students who drop out of school tend to perform at lower levels on many variables (for example, school achievement) than do students who remain in school through the completion of high school. Thus, a random sample of sixth graders would likely be more representative of all 11-year-olds than would a random sample of twelfth graders selected to be representative of all 17-year-olds, given the progressive dropout of students during junior and senior high schools.

Even if one could verify equal representativeness of sampling at each age level, a cross-sectional design cannot yield information about the stability of individual differences from age to age, because different individuals are assessed at each time of measurement. Given the importance of understanding both the general developmental trend for any behavior of interest as well as individual differences around this trend, the inability to study individual differences in change is an important shortcoming of the cross-sectional design.

A second common design is the longitudinal design, in which all measurements are obtained from a single sample of participants, persons who are usually of a single birth cohort. This single sample is then observed at two or more times of measurement. Results from longitudinal studies are often arrayed as a function of the chronological age of the sample at the several times of measurement. But, historical time or period is perfectly correlated with, and hence completely confounded with, chronological age of participants at the different times of measurement, so historical period effects are alternative explanations of any purported age-related trends in data.

The longitudinal design has one major advantage over the cross-sectional design: the longitudinal design allows the researcher to study age changes, as changes in behavior by individuals are assessed directly by tracking the same subjects at two or more ages. This allows the modeling of individual differences about the developmental trend in addition to charting the mean developmental trend. Unfortunately, the typical longitudinal design also must confront at least two important methodological problems. The first of these involves retesting effects. Simply testing subjects a second time on a particular test often leads to some change in scores. In most longitudinal studies, participants are assessed at three or more times of measurement, increasing the likelihood that retesting will confound results of age changes. For example, Nesselroade and Baltes (1974) presented evidence that retesting effects explained approximately one half of the mean age changes on several dimensions of mental ability. The second problem concerns sample representativeness and the presence of the differential dropout of participants across time. Often, participants willing to commit to participation in a longitudinal design are not representative of the population at large, and later dropping out of a longitudinal study is usually nonrandom. Both of these problems limit the generalizations that may be made from longitudinal studies.

The time-lag design is a third simple developmental design, although it is rarely used. In the time-lag design, measurements are obtained from participants all of whom are the same age, but who are tested at different points in historical time. That is, one could study 10-year-olds in 2010, 2020, and so on. In a time-lag design, cohort and period are perfectly confounded. Further, because age is held constant, the time-lag design is most useful for tracking secular trends. Because developmental psychology has a primary goal of studying age-related trends and because age is held constant in this design, the time-lag design has less direct relevance for the field than do the other two simple designs, but timely applications of the time-lag design should not be overlooked.

Returning to the general developmental model proposed by Schaie (1965), three more complex developmental designs are possible within this framework. These are (a) the cohort-sequential design, obtained by the factorial crossing of cohort and age; (b) the time-sequential design, arising from the factorial crossing of period (or time of measurement) and age; and the cross-sequential design, defined by the factorial crossing of cohort and period (or time). Although Schaie initially contended that the effects of age, cohort, and period could be identified separately, subsequent commentators (for example, Mason & Fienberg. 1985) have concluded that the influences of age, cohort, and period cannot be disentangled in a simple mathematical way. The lack of separate identification of these effects is portended by the dependence among age, period, and cohort in any of the three designs discussed by Schaie. For example, consider the cohort-sequential design, in which cohort and age are crossed factorially. In this design, the time of measurement (or period) is fixed by the need to assess a given cohort at a particular chronological age (for example. children born in 2000 and assessed at 10 years of age must be assessed in the year 2010). Thus, one cannot vary factorially and independently all three factors of age, period, and cohort in a single design; once levels of two of these factors are fixed, the levels of the third are fixed as well.

Because effects of the three factors of age, period, and cohort cannot be estimated separately, the choice of a design should be dictated by theory regarding which factors will have important influences on change. For example, the effects in a cohort-sequential design, in which cohort and age are crossed, are interpreted most simply under the assumption that period (or historical time) has no influence on the behavior of interest. If this assumption is accurate, the cohort-sequential design yields age trends for each of several cohorts, enabling the researcher to study the form of general age trends and how these are moderated by cohort. Similar conditions hold for the two remaining designs: the time-sequential design offering clear interpretations if cohort effects are negligible, and the cross-sequential design yielding unconfounded interpretations if age effects are assumed to be zero. Given these considerations, the cross-sequential design appears to be the least adequate of the three complex designs, as age effects must be assumed to be zero, and the cohort-sequential design is the most optimal, because both age and cohort are explicitly included in the design. Ironically, the cross-sequential design has been the most widely used of the designs (for example, Nesselroade & Baltes. 1974). and the cohort-sequential design has arguably been the least used of the designs. The reasons for the differential use of designs are clear, as the cohort-sequential design takes a longer number of years to complete and yields developmental functions across a smaller number of age levels. Still, the cohort-sequential design deserves wider use in the future to corroborate and place on firmer empirical footing the findings generated by other designs.

## Measurement Issues

Measurement involves the assignment of numbers to observations (for example, persons) to represent the magnitude of a particular characteristic for each observation. Thus, one may use a ruler to assign numbers on any of a set of numerical units—for example, inches, feet, or centimeters—to represent the height of each of a set of individuals. Here, the measuring device is the ruler, the characteristic of interest is height, and a direct ratio mapping exists between the length on the measuring scale and the numbers to be assigned to observations.

Measurement is a crucial. if undervalued, aspect of all research endeavors in psychology, with profound implications for representing relations among variables and hence for the theories designed to account for these phenomena. Nowhere is the importance of measurement more obvious than in developmental psychology. When attempting to study the relation of behavior to age. as B = f(A), measurement is paramount, for one must ensure that the units of a measurement scale are comparable across the age span and that one is assessing the same characteristic at all ages for the function related behavior and age to have any interpretation. Researchers frequently assume that their measurements embody desiderata such as comparability of units across age levels but rarely are these assumptions tested directly.

Scales of measurement are often discussed in terms of the well-known classification into nominal, ordinal, interval, and ratio scales. Numbers on a nominal scale serve only to identify the class into which a person falls and do not imply an ordering of individuals on any continuum. In contrast, numbers on the remaining three scales provide an ordering of individuals: an ordering with unequal intervals using an ordinal scale, with equal intervals on the interval scale, and with both equal intervals and a nonarbitrary zero point with a ratio scale.

Cross-cutting the preceding classification, at least partially, is the distinction between qualitative and quantitative variables. A nominal scale clearly represents qualitative differences among persons, but the relations between the three remaining scale types and the qualitative-quantitative distinction are less clear. For example, an ordinal scale might represent an ordered categorical or qualitative variable, with numbers representing different, qualitatively distinct, and hierarchically ordered stages. Or, the ordinal scale may represent an initial, unrefined attempt to assess a quantitative continuum. The confusion between scale types and the qualitative-quantitative distinction has been muddied by researchers in certain domains (for example, moral development, ego development) who have argued for the viability of qualitative, hierarchically ordered stages in the particular domain, but these same researchers have provided instruments with scoring options that yield scores that seemingly fall on interval scales, suggesting the presence of a quantitative dimension. Complications of this sort continue to concern the field of developmental psychology.

Early longitudinal studies, such as the Berkeley Growth Study by Bayley (1956; Bayley & Jones, 1937) employed measures from multiple domains, and many of the variables had either ratio or at least interval status, for example. Bayley (1956) displayed charts of growth in height and weight, which are usually assumed to meet the stipulations of a ratio scale. These scales enabled the fitting of informative age functions to data but were of greater utility in portraying physical growth than psychological development. For psychological development, Bayley developed an interesting approach to constructing derived scales for psychological variables (for example, her 16-D scale was normed to the mean and standard deviation exhibited by a sample of 16-year-olds) that would allow one to study changes in both mean and variance across age levels. However, the idea never took hold, and measurement concerns have a less central role than in the past. Most contemporary work uses measures designed for use with participants in fairly restricted age ranges, circumventing problems of comparability across extended age ranges and, in the process, hindering the study of developmental changes across these broader age levels. Moreover, the only measures that tend to be used across a wide range of ages during the developmental period from infancy through adolescence are measures of intelligence. These measures typically provide an IQ, which is normed in a nondevelopmental fashion—to yield a mean of 100 and standard deviation of 15 in the population at each age level. Hence, modeling the mean developmental trend is hazardous or impossible given the measurement properties of most measures used in current research.

One dependent variable that may provide a common metric across age levels and is widely used in studies of cognitive processes is reaction time, a variable that appears clearly to have ratio scale properties. In aging work, several meta-analyses have been performed on the general slowing hypothesis. Under general slowing, the rate of mental processes may slow (Birren, 1965) or information may be lost in a consistent fashion (Myerson, Hale, Wagstaff, Poon, & Smith, 1990) during the aging period. Regardless of the basis for the effect, various mathematical and statistical models have been fit to reaction time data to represent the extent and consistency of the slowing. Some work has been done to model the speeding up of processing, represented by reductions in reaction time, during childhood and adolescence. The basis of the speeding up of processing during the developmental period, however, is treated as a quantitative improvement in performance, and this is clearly a problematic assumption, as it may be for slowing during aging.

For example, in the domain of numerical processing (for example, addition. subtraction), children appear to proceed through a series of qualitatively distinct stages, representing different strategies for solving problems of a given type. Regardless of whether strategy choice continues unabated throughout life or a person finally adopts his or her optimal strategy and uses this strategy consistently, the qualitative advances in strategies may underlie the quantitative improvements in reaction time (Widaman, 1991). Thus, researchers may misconstrue the research problem as the understanding of the form of the function relating the quantitative reaction time variable to age, whereas the important developmental finding is the qualitative changes producing the quantitative improvements in performance. This is but one example of the measurement problems arising in developmental contexts. Future advances in both substantive theory and measurement theory may lead the way to clearer thinking about such problems—studying the measures of behavior that matter the most, rather than studying measures of behavior that are easiest to amass.

## Statistical Models and Procedures

During the 1950s and 1960s, Wohlwill (1973) detected a clear “invasion of the experimentalists” into developmental psychology. This invasion took the form of researchers trained in experimental studies of mature persons, usually college students, opting to design studies that included multiple age groups, to test whether similar results would be found at all points on the age continuum. This invasion had both strengths and weaknesses. For example, the rigor of developmental research was perhaps improved, and research topics certainly were expanded in interesting directions, but, the results generated often had less relevance to traditional issues that defined the field than did typical research results.

Experimentalism has become firmly ensconced as one approach to developmental science. Statistical methodologists, however, have brought to the field the most modern analytic techniques available. Nevertheless, the standard methods of statistics—including correlation, regression, and the analysis of variance (ANOVA)—continue to be the most commonly used in developmental studies and will likely be the standard for some time to come.

Before discussing the newer methods of representing and analyzing developmental data, some comments should be made about the kinds of questions traditionally framed within developmental theories. The standard techniques of ANOVA and correlation and regression analysis are frequently used in developmental research and are often used as intended, but these techniques are subject to misuse and may fail to capture certain important aspects of developmental data. For example, ANOVA, designed to analyze mean differences across levels of qualitative independent variables, is used to test developmental changes as a function of age in many contexts. However, when used with longitudinal, repeated measures data, researchers cannot model the pattern of individual differences over time, as these are relegated to the within-group covariance matrices, which are frequently ignored and almost always unreported in research publications.

With correlation/regression methods, crucial tests of differences across groups often are not conducted, leaving the research literature in disarray. For example, when investigating gender differences in development, researchers commonly test whether correlations or regression weights differ significantly from zero, and they do this separately for samples of males and females. If a correlation or regression weight is significant for one group and not for the other, this is construed as evidence of a difference in the development of the genders. The crucial tests of the difference between the correlations (or regression weights) for the two groups, however, might reveal nonsignificance, suggesting a lack of difference across genders in developmental processes. Tests of the significance of the difference between independent correlations are often viewed as unpowerful, however, failure to utilize the proper tests results in a research literature that is open to many, conflicting interpretations.

Regardless of their inadequacies and potential misuse, ANOVA and correlation and regression analysis have helped frame statistically the important questions asked in developmental research. ANOVA emphasizes the understanding of the mean developmental trend, and correlation and regression analysis are used to study individual differences about the mean trend. Indeed, correlational measures were the mainstay for investigations of the differentiation of abilities and other processes during childhood and adolescence. The invasion of the statistical methodologists may be seen as an attempt to introduce new methods of analysis that correct problems in both ANOVA and correlation/regression analysis and that represent more adequately developmental processes and developmental change.

In a special issue of Child Development published in early 1987, several researchers promoted the utility of structural equation modeling (SEM) for developmental psychology, although others offered rational concerns about how the techniques would be used and interpretations would be drawn. Despite misgivings, the manner in which SEM can structure ideas and results cannot be discounted. Indeed, ways of addressing many key problems—including the distinctions between state and trait constructs as well as the proper causal lag in longitudinal studies—are uniquely applied with SEM. These benefits have been so clearly realized that applications of SEM in developmental research are becoming quite common.

One way of using SEM informatively in developmental research is multiple-group confirmatory factor analysis (CFA) to study the factorial invariance of a set of measures (Widaman & Reise, 1997). Using this multiple-group CFA approach, the investigator can test whether a consistent relation holds between the underlying factors and their observed indicators across age levels. Factorial invariance of this type is evidence that the same theoretical constructs are assessed at the different age levels. Moreover, researchers may then investigate differences in mean level and variance on the latent variables identified, as well as the structural relations among the latent variables. In the future, applications of item response theory methods, which are related to CFA models (Reise, Widaman, & Pugh, 1993), offer hope of establishing the comparability of the metric of measured variables across age levels, a problem that continues to plague the field.

Another application of SEM that has special relevance to developmental research is the specification of growth curve models. Under this approach, data from multiple times of measurement are the primary measured variables, and the latent variables that are specified represent both initial level at the first time of measurement and growth since the first time of measurement. Because individual differences in both level and growth are identified in this manner, variance on these latent variables may be predicted from other variables in the model. In this way, the investigator may find the key explanatory variables that account for individual differences in initial level and subsequent growth in a particular behavior of interest. Contributions in this vein continue to mount. and fruitful approaches for dealing with planned or unplanned missing data, a common woe in longitudinal studies, are being developed.

Yet another approach to the identification of level and growth factors within longitudinal data is a generic approach often identified as hierarchical linear modeling (HLM) (Bryk & Raudenbush, 1987, 1992). HLM recognizes the hierarchical structure of data. For example, children are nested within families, families are nested within socioeconomic strata, and so forth. In a longitudinal study, measurements at different ages are nested within individuals, so initial level and growth can be represented in HLM models, along with predictors of both initial level and subsequent growth. Whether SEM or HLM models are able to it easily or well growth data in which individuals may have different intercepts, growth rates, and asymptotes is a topic for future research.

Another statistical model that will be of increasing importance for developmental psychology goes by the name of survival analysis (Willett & Singer, 1997). Here, an important transition or event—such as dying or dropping out of school—is the outcome variable.

The survival model represents the likelihood or probability of the event as a function of age, and covariates may be added that affect the likelihood of occurrence. Although survival modeling is rare in developmental research, applications of the method are almost certain to increase in the future.

Advances have been made in representing qualitative developmental advances as well. For example, Collins and Cliff (1990) discussed a longitudinal extension to the Guttman scale for representing unitary, cumulative development. In 1997, Collins and colleagues (Collins, Graham, Rousculp, & Hansen, 1997) developed computer programs and analytic procedures for latent class analysis and latent transition analysis (LTA). LTA is useful for representing the unidirectional changes that characterize certain domains of behavior, such as stages of drug use or stages of arithmetic competence. LTA yields probabilities of making the transition from one level or stage to another more advanced stage and can test assumptions of lack of regressions to earlier levels or stages. Moreover, covariates can be included that explain individual differences in probabilities of stage transition.

One common requirement of all of the preceding new methods of analysis is the need for large sample sizes. This is perhaps the single largest stumbling block to widespread, confident use of these methods, as the standards in the field—given the temporal and monetary expenses associated with longitudinal studies—are for sample sizes that are not optimal for the application of sophisticated methods of analysis. With the elegant methods of analysis that have been and are being developed, the field of developmental psychology will be well equipped to understand growth, stability, and decline across the life span in unprecedented ways if a solid commitment is made to collection of adequate measurements on samples of adequate size.

## Summary

The research methods used in developmental psychology are undergoing tremendous change, abetted by the invasion of the statistical methodologists. Continuing advances in the design of studies, the construction of measures and their proper scoring, and the methods used to analyze data promise exciting advances in the substantive understanding of the growth and development of individuals across the life span.

### References:

- Baltes, P. B.. & Nesselroade, J. R. (1979). History and rationale of longitudinal research. In J. R. Nesselroade & P. B. Baltes (Eds.). Longitudinal research in the study of behavior and development (pp. 1-39). New York: Academic Press.
- Bayley, N. (1956). Individual patterns of development. Child Development, 27, 45-74.
- Bayley, N., & Jones, H. E. (1937). Environmental correlates of mental and motor development: A cumulative study from infancy to six years. Child Development, 8, 329-341.
- Bell, R. Q. (1953). Convergence: An accelerated longitudinal approach. Child Development, 24, 145-152.
- Birren, J. E. (1965). Age changes in speed of behavior: Its central nature and physiological correlates. In A. T. Welford & J. E. Birren (Eds.), Behavior, aging, and the nervous system (pp. 191-216). Springfield, IL: Charles C. Thomas.
- Bryk, A. S., & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147-158.
- Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage.
- Collins, L. M.. & Cliff, N. (1990). Using the longitudinal Guttman simplex as a basis for measuring growth. Psychological Bulletin, 108, 128-134.
- Collins, L. M., Graham, J. W., Rousculp, S. S.. & Hansen, W. B. (1997). Heavy caffeine use and the beginning of the substance use onset process: An illustration of latent transition analysis. In K. J. Bryant. M. Windle, & S. G. West (Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 79-99). Washington, DC: American Psychological Association.
- Mason, W. M.. & Fienberg, S. E. (Eds.). (1985). Cohort analysis in social research: Beyond the identification problem. New York: Springer-Verlag.
- Myerson, J., Hale, S.. Wagstaff, D., Poon, L. W., & Smith, G. A. (1990). The information-loss model: A mathematical theory of age-related cognitive slowing. Psychological Review, 97. 475-487.
- Nesselroade, J. R.. & Baltes, P. B. (1974). Adolescent personality development and historical change: 1970-1972. Monographs of the Society for Research in Child Development, 39 (1. Ser. No. 154).
- Reise, S. P., Widaman, K. E. & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114, 552-566.
- Schaie, K. W. (1965). A general model for the study of developmental problems. Psychological Bulletin, 64, 92-107.
- Schroots, J. J. E, & Birren, J. E. (1990). Concepts of time and aging in science. In J. E. Birren & K. W. Schaie (Eds.). Handbook of the psychology of aging (3rd ed.. pp. 45-64). San Diego: Academic Press.
- Widaman, K. F. (1991). Qualitative transitions amid quantitative development: A challenge for measuring and representing change. In L. M. Collins &J. L Horn (Eds.). Best methods for the analysis of change: Recent advances, unanswered questions, future directions (pp. 204-217). Washington, DC: American Psychological Association.
- Widaman, K. E. & Reise, S. P. (1997). Exploring the measurement invariance of psychological instruments: Applications in the substance use domain. In K. J. Bryant, M. Windle, & S. G. West (Eds.), The science of prevention: Methodological advances from alcohol and substance abuse research (pp. 281-324). Washington, DC: American Psychological Association.
- Willett, J. B., & Singer, J. D. (1997). Using discrete-time survival analysis to study event occurrence across the life course. In I. H Gotlib & B. Wheaton (Eds.), Stress and adversity over the life course: Trajectories and turning points (pp. 273-294). New York: Cambridge University Press.
- Wohlwill, J. F. (1973). The study of behavioral development. New York: Academic Press.

Back to Developmental Psychology