Experimental Designs

The validity of inferences stemming from empirical research in industrial and organizational psychology and allied disciplines is a function of a number of factors, including research design. Research design has to do with the plan, structure, or blueprint for a study. The literature indicates that among the components of such a plan are (a) the experimental design type (i.e., randomized experiment, quasi-experiment, and non-experiment); (b) the study setting (e.g., created for the purpose of doing research); (c) the numbers and types of study participants; (d) the way in which the variables considered by the study are operationally defined; and (e) the techniques that will be used to analyze the data produced by the study. The focus here is on the randomized experiment. It differs in important ways from both nonexperimental and quasi-experimental design types.

Prior to considering the nature of randomized experiments, we consider several issues concerning the validity of inferences stemming from empirical research. Note that the same issues apply not only to randomized experiments but also to quasi-experiments and nonexperiments.

Factors Affecting the Validity of Research-Based Inferences

The overall correctness of inferences (e.g., research-based conclusions, recommendations for practice) stemming from a study is a function of its design and the manner in which it is actually conducted. Four facets of validity are critical: construct validity, statistical conclusion validity, internal validity, and external validity, as noted by Thomas Cook and Donald Campbell in 1979, and by W. R. Shadish, Cook, and Campbell in 2002.

Construct Validity

Construct validity is a function of the degree of correspondence between the constructs dealt with by a study and their realizations. It has to do with not only the operational definitions of variables (e.g., manipulations, measures), but also the empirical realizations of other features of a study (e.g., types of participants, research settings). Construct validity inferences are threatened by a number of factors, including inadequate preoperational definitions of constructs, study procedures that lead participants to guess a study’s hypotheses and behave in ways that confirm them, operational definitions that under represent focal constructs, and a lack of correspondence between the type of participants in a study and the way the participants are labeled by a researcher.

Statistical Conclusion Validity

Statistical conclusion validity has to do with the correctness of inferences about relations between variables that stem from the results of statistical tests. Among the factors that threaten this facet of validity are testing statistical hypotheses with data that violate relevant assumptions (e.g., homogeneity of variance), implementing treatments unreliably within study conditions, conducting research in settings having random irrelevancies (e.g., fluctuations in noise, temperature, illumination), and sampling too few units (e.g., participants) to provide for adequate statistical power in hypothesis testing.

Internal Validity

Internal validity is the degree to which inferences about causal connections between variables are correct. Among the factors that detract from the validity of such inferences are history, maturation, instrumentation, testing, selection, and mortality.

External Validity

External validity deals with the correctness of inferences about the generalizability of a study’s findings to and across populations of settings, participants, time periods, and so forth. External validity is threatened by such factors as interactions between settings and treatments, interactions between history and treatments, and interactions between selection and treatments.

Defining Attributes of Randomized Experiments

Randomized experiments have four major characteristics. Taken together, these serve to differentiate randomized experiments from both quasi-experiments and nonexperiments.

Manipulation of Variables

In randomized experiments, the researcher manipulates (as opposed to measures) the values of the study’s independent variables (e.g., X,, X2,… X). In theory, there is no limit on either (a) the number of variables that might be manipulated in any given study or (b) the number of levels of each of the manipulated variables. In practice, however, experiments have relatively small numbers of conditions— that is, unique combinations of independent variables and levels of such variables. One reason for this limitation is that the greater the number of conditions, the more difficult it is to conduct the experiment. Another reason is that statistical power considerations militate against experiments having large numbers of conditions. Yet another reason is that the theories or models that guide research often consider only a limited set of the assumed causes of a study’s dependent variables. Nevertheless, a requirement of a randomized experiment is that there must be at least two levels of one (or more) independent variable(s). However, contrary to what many appear to believe, there is no requirement that there be any control group, including a nontreatment control group. For example, a study dealing with the effects of variations in job design on job satisfaction could contrast relatively low and high worker autonomy conditions. There would be no need for an autonomy control condition. However, the manipulated levels of worker autonomy would have to differ enough to produce changes in measured levels of the study’s dependent variable (i.e., job satisfaction).

Randomized experiments are often of the factorial variety. In factorial experiments, there are two or more independent variables and each such variable has at least two levels. For example, the just described job design study could be modified to make it factorial by adding a second independent variable, such as task variety. A major advantage of a factorial study is that it allows for testing both the main and interactive effects of independent variables on a study’s dependent variable(s).

One of the important reasons for manipulating the levels of independent variables is that doing so ensures that assumed causes occur before assumed effects. This temporal precedence is vital to the internal validity of research.

Random Assignment of Units to Conditions

In randomized experiments, research units (e.g., individuals, groups) are randomly assigned to conditions. This randomness is critical because it serves to greatly enhance internal validity. The major reason for this enhancement is that when a sufficiently large number of units have been randomly assigned to study conditions, these conditions will be highly alike with respect to any and all variables prior to the time independent variables are manipulated. As a consequence, the post treatment levels of dependent variables across conditions will be a function of the study’s manipulations, as opposed to initial differences in such variables across the same conditions.

Measurement of Dependent Variables

The effects of the study’s manipulations are assessed through measures of dependent variables. In the case of research involving human participants, the measures can focus on such outcomes as attitudes, beliefs, intentions, psychological states, and behaviors. Among the types of measures that might be used are observations of behaviors, questionnaires, ability tests, and physiological measures.

In many experiments it is valuable to assess the degree to which the manipulations have had desired effects on study participants. Such assessments use measures known as manipulation checks. For example, in the job design study described previously, a researcher could use questionnaires to assess differences in participants’ beliefs about the amount of discretion they had over the way the job was performed in the relatively low and high autonomy conditions. Manipulation checks are important in interpreting the results of a study, especially when its manipulations fail to produce expected changes in the values of dependent variables. In such cases, manipulation check data can be used for internal analyses, as noted by Elliot Aronson, Phoebe C. Ellsworth, J. Merrill Carlsmith, and Marti H. Gonzales in 1990.

Control Over Extraneous or Confounding Variables

Randomized experiments that are well designed and properly conducted provide for high levels of control over any and all factors that might influence the values of dependent variables other than the manipulations. There are several strategies for achieving such control. One is to make experimental conditions equivalent to one another in terms of all factors other than the manipulations to which participants are exposed. Another is to ensure that in the course of participating in the experiment, participants are not differentially exposed to extraneous influences (e.g., environmental events).

Research Settings versus Experimental Design Types

It is important to recognize the distinction between experimental design types and research settings. Design types include randomized experiments, quasi-experiments, and nonexperiments. All of these can be used in a variety of settings. Typically, research settings have been characterized as being of either the laboratory or field variety. However, the distinction between the laboratory and the field is not always clear, as observed by, among others, J. P. Campbell in 1986 and E. F. Stone-Romero in 2002. Thus, a better distinction involves a contrast between settings that are either (a) created for the specific purpose of doing research (special purpose setting) or (b) created for purposes other than research (non-special purpose setting). In general, special purpose settings exist for relatively short time periods, and they cease to exist when the study for which they were created has been completed. In addition, they are designed to ensure that independent variables can be manipulated effectively. As such, the literature indicates that they may have fewer features or elements than would be found in non-special purpose settings. Nevertheless, Elliot Aronson and colleagues wrote in 1990 that it is (a) vital that settings have experimental realism and (b) desirable that they have some degree of mundane realism. For example, a researcher interested in studying the effects of variations in task characteristics on task satisfaction could create a special purpose setting in either a university laboratory or a nonuniversity facility (e.g., a building in an industrial park). For such a study, it would be important to have tasks that were meaningful enough to have desired degrees of impact on research participants, thus ensuring experimental realism. However, it would not be important to have other elements that would be found in an actual organization (e.g., incentive systems, fringe benefits). Nevertheless, the greater the extent to which such elements were part of the study’s setting, the greater would be the mundane realism of the study.

The previously described task characteristics of a study also could be conducted in a non-special purpose setting (e.g., an actual work organization). However, it would typically be much more difficult to conduct the study in such a setting. One reason for this difficulty is that in most such settings it is very difficult to effect the changes in existing organizational arrangements (e.g., physical layout of facilities, assignment of workers to jobs, pay and fringe benefits provided workers). Thus, experiments in non-special purpose settings typically have lower levels of control over extraneous or confounding variables than do experiments performed in special purpose settings, and internal validity is more suspect in settings of the former than the latter variety. However, studies in non-special purpose settings typically have higher levels of mundane realism. Often, this serves to bolster their external validity.


  1. Aronson, E., Ellsworth, P. C., Carlsmith, J. M., & Gonzales, M. H. (1990). Methods of research in social psychology (2nd ed.). New York: McGraw-Hill.
  2. Berkowitz, L., & Donnerstein, E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. American Psychologist, 37, 245-257.
  3. Campbell, J. P. (1986). Labs, fields, and straw issues. In E. A. Locke (Ed.), Generalizing from laboratory to field settings: Research findings from industrial-organizational psychology, organizational behavior, and human resource management (pp. 269-279). Lexington, MA: Lexington Books.
  4. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin.
  5. Fromkin, H. L., & Streufert, S. (1976). Laboratory experimentation. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 415-465). Chicago: Rand McNally.
  6. Kerlinger, F. (1986). Foundations of behavioral research (3rd ed.). New York: Holt, Rinehart & Winston.
  7. Runkel, P. J., & McGrath, J. E. (1972). Research on human behavior: A systematic guide to method. New York: Holt, Rinehart & Winston.
  8. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.
  9. Stone-Romero, E. F. (2002). The relative validity and usefulness of various empirical research designs. In S. G. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (pp. 77-98). Malden, MA: Blackwell.