Work Samples

Work samples, in the strictest sense, are hands-on performance tests or simulations of the job, which are used to estimate current or predict future performance on similar tasks. Uses of work samples include the following:

  • Selection: Work samples can be used to decide which applicants to hire. This is the most typical use of work samples.
  • Performance measurement and evaluation: Work samples are sometimes used to estimate an individual’s current level of job performance when other measures are unavailable. This is discussed at greater length in a following section.
  • Vocational assessment of disabled workers: Work samples are commonly used to determine whether applicants with disabilities can perform the duties required on different types of jobs. These types of work samples are used to provide career counseling and vocational guidance to disabled workers.
  • Trainability measure: Work samples are sometimes used after a short training session to decide whether the person should be selected to continue in a lengthier training program. In these cases the work sample is intended measure trainability to predict how successful the person will be in the training program.
  • Training evaluation: Work samples are commonly used at the completion of a training program to determine whether the training was effective at improving performance.

The defining feature of work samples is physical replication of the critical tasks performed on the job; however, many other selection tools that do not replicate the work environment can also be considered work samples. Consequently, certain types of simulations fit more squarely under the work sample label than others. In the broadest sense of the term, any tests assessing specific skills, knowledge, or aptitudes that are critical for performance on the job in question may, in some cases, be characterized as work samples.

Work Sample Characteristics

All work samples are based on the concepts that the test samples behaviors instead of measuring traitlike constructs and that those behaviors sampled are similar to those elicited on the job. Therefore, work samples are job specific. Although one work sample could be used to predict the same job at multiple organizations, the same test may no longer qualify as a work sample if used to predict a different type of job. To illustrate: A test measuring speed and accuracy of identifying number transcription errors could serve as a work sample for data entry clerks. However, the same test is not a work sample for a quality control position in a candy factory even if the test is a valid predictor of performance for both jobs. In the first instance, the test measures behavior similar to that required on the job (checking numbers); in the second instance, the test is likely an indicator of the construct attention to detail. Therefore, researchers cannot determine if a test is a work sample without knowledge of the job to be predicted.

Fidelity to the Job

Work samples can range from high fidelity (an exact duplication of job tasks) to low fidelity (having a measurement format that differs from the job tasks). Examples of high-fidelity work samples from a broad range of jobs include flight simulators, dragging a fire hose and climbing a ladder, blueprint reading tests, typing and filing tests, dental carving tests, sewing tests, tests of microscope use, driving tests, assessment center simulations (i.e., in-basket tests, leader-less group discussions, business games, subordinate simulations), police report writing tests, computer programming tests, and so on.

Low-fidelity work samples fall into one of two main categories:

  1. Physical ability tests such as manual dexterity tests, optical exams, and strength tests
  2. Paper-and-pencil tests such as job knowledge measures (e.g., farming knowledge test), situational judgment tests, and job-specific skills or aptitude tests (e.g., math tests)

As such, some researchers have used the term work sample to refer to a variety of selection tools commonly used by industrial/organizational (I/O) psychologists.

Work Sample Disadvantages

Although work samples are often useful tools, there are three occasions when their usefulness may be limited:

Cost of Work Samples

The first disadvantage is concerned with the cost or utility of the tool. Development and maintenance of a simulation can be expensive. High-fidelity work simulations must be tailored to tasks performed on the job, and personnel must be trained to administer and score the work sample. In some cases the expense of work samples may outweigh the potential incremental validity of work samples over other tests. For example, when hiring a carpenter it might not be cost-effective to ask 30 applicants to actually build a cabinet. The expenditure of time and materials for a high-fidelity cabinet-building simulation may be excessive. Instead, low-fidelity simulations of the job such as situational judgment tests or job knowledge measures may be more utilitarian. However, the cost of a high-fidelity simulation can vary greatly depending on the complexity of the simulation. For example, with low-complexity repetitive jobs in manufacturing such as assembling computer chips, a high-fidelity work sample soldering wire connections may be quite cost-efficient.

Work Samples as Measures of Current Performance

Work samples are sometimes used to estimate current employee job performance when other measures such as supervisor ratings are unavailable. In these instances, the work samples no longer serve as predictors but rather as proxy criterion measures for a variety of human resource (HR) practices including the following: validating other selection tools, evaluating training outcomes, assessing individual or workforce training needs, giving performance appraisals, or even making promotion and pay decisions. However, the cautions against using work samples as criterion measures are with good reason. Even work samples with point-to-point correspondence to the job can differ from day-to-day performance. Levels of motivation in work samples are high. However, typical levels of motivation in day-to-day performance can vary greatly across individuals. This means work samples measure what an individual can do (maximal performance) but not necessarily what an individual typically will do (typical performance). In other words, motivation on the job that is a direct determinant of performance is not necessarily measured in a work sample. Therefore care should be taken in assuming work samples are accurate measures of current job performance.

Trainable Knowledge and Skills

Using work samples as a selection tool may not be feasible or appropriate when the work sample measures knowledge or skills that are easily trainable. In entry-level positions, applicants are expected to learn on the job. In addition, some applicants may have experience, making them appear to be better performers than those who do not. But if a small amount of training would change someone’s score on the work sample, many inexperienced but future high performers would be falsely rejected when using the work sample. This limits the usefulness of a work sample for two reasons. First, it lowers the criterion-related validity of the work sample. Second, members of protected classes may differ in their opportunities to gain experience in certain jobs (e.g., women in construction). If this is the case, the work sample would result in adverse impact, and awareness of the work sample may further have a chilling effect on those who would otherwise apply. Although adverse impact of work samples can often be defended on the basis of content validity, if the skills are easily trainable at low cost to the organization, the work sample could be subject to legal challenge.

Work Sample Advantages

Despite the disadvantages just mentioned, work samples that assess skills that applicants cannot easily acquire on the job can offer several advantages over other commonly used selection tools including good legal defensibility, realistic job previews, and positive applicant reactions. Well-designed work samples have clear content overlap with the job; therefore they are usually legally defensible on the basis of content validity alone. This type of validity evidence tends to be well accepted by the courts. In addition, most work samples are also face valid, or look like the job. Although face validity is not a legally defensible type of validity, it does lead to more positive reactions and perceived fairness by applicants when compared with other common selection tests. Work samples also offer a realistic preview of the job allowing applicants to better judge their own qualifications for and interest in the position. As a result of the perceived fairness and ability to self-assess their own performance, applicants may be less likely to quit shortly after hire. Both the content validity and face validity of work samples make work samples typically the least legally challenged and most legally defensible of all commonly used selection tools.

In addition to positive reactions and legal defensibility, research has also shown that work samples can have positive criterion-related validities that match or even exceed those of other selection tools. On average, the meta-analytic estimate of work sample validity for predicting job performance is .46, and training performance is .42. Why do they predict so well? One explanation is that work samples are based on the tenet that past performance is the best predictor of future performance. However, proper work sample design plays a central role in whether a work sample measures up to that tenet. Critical design features include job analysis emphasizing behaviors and tasks, content validity (i.e., bandwidth and fidelity to the job), rater training to increase accuracy and reduce bias (i.e., leniency, severity, and halo), standardization of administration and scoring, assessment of interrater reliability, and emphasis on rating of behaviors. The presence of these features in a work sample increases the likelihood of, but does not guarantee, significant criterion-related validity. Moreover, because some jobs are easier to simulate than others, work samples may better predict performance in jobs with clearly defined and short-duration tasks that do not change over time (e.g., clerical or manufacturing jobs) than in jobs with less structured and longer-duration tasks that do change over time (e.g., project managers and engineers).

Work Samples: Past, Present, and Future

Published research on work samples began as early as the 1930s; however, research interest in work samples has slowed through last 30 years. Instead, research has turned to more specific types of work samples, including situational judgment tests and assessment centers. But despite the lull in academic research, actual use of all types of work samples has continued in applied settings. Technological innovations continue to make simulations less expensive and more realistic. Virtual reality, voice recognition software, and computerized scoring are just a few of the new technologies incorporated in work sample design. Yet research on work samples has not kept pace with these changes. Questions about whether this new technology can improve work sample predictive validity remain unanswered. Because technology limits our ability to create complex simulations, the utility of work samples in the future is unknown. Nevertheless, one thing is certain—work samples are as much a tool of the future as they are of the past.


  1. Asher, J. J., & Sciarrino, J. A. (1974). Realistic work sample tests: A review. Personnel Psychology, 27, 519-533.
  2. Callinan, M., & Robertson, I. T. (2000). Work sample testing. International Journal of Selection and Assessment, 8, 248-260.
  3. Hardison, C. M., Kim, D. J., & Sackett, P. R. (2005). Meta-analysis of work sample criterion related validity: Revisiting anomalous findings. Paper presented at the Twentieth Annual Conference of the Society of Industrial Organizational Psychology, Inc., Los Angeles.
  4. Robertson, I. T., & Kandola, R. S. (1982). Work sample tests: Validity, adverse impact and applicant reaction. Journal of Occupational Psychology, 55, 171-183.
  5. Smith, F. D. (1991). Work samples as measures of performance. In A. K. Wigdor & B. F. Green, Jr. (Eds.), Performance assessment for the workplace: Vol. 2. Technical issues (pp. 27-52). Washington, DC: National Academy Press.
  6. Terpstra, D. E., Mohamed, A. A., & Kethley, R. B. (1999). An analysis of federal court cases involving nine selection devices. International Journal of Selection & Assessment, 7, 26-34.

See also: