• Skip to main content
  • Skip to primary sidebar

psychology.iresearchnet.com

iResearchNet

Psychology » Educational Psychology

Educational Psychology

Student Assessment

Educational PsychologyEducational psychologists have been major players in the measurement of student performance, virtually defining a national achievement curriculum. The key­stone is the standardized test. Alternatives have appeared under labels like authentic or performance-based assessment, along, with packages for placement in special programs (learning handicapped, emotionally distressed. gifted, English-language learners).

Linking these assessments are criteria and methods to establish validity and reliability and an overarching system of constructs and standards. Validity assures that an assessment measures a well-defined construct. For example, a reading test should test “reading.” Re­liability refers to the trustworthiness of an instrument. Groups of judges rating student compositions must agree among themselves.

Several tensions trouble the field of student assess­ment, reflecting the importance of schooling for the individual and the society. Most significant is locus of control, pitting the classroom teacher against more distant authorities (the school district, the state, or even the federal government). In developed countries, centralized testing often determines admission to sec­ondary and postsecondary schooling. Local control of U.S. schools is a long-standing tradition. This tradition has been challenged, however, as states and the fed­eral government provide increased funding for public education and concomitant demands for accountabil­ity. Some teachers have resisted pressure to “teach to the test.” offering alternative methods of their own devising.

Multiple-Choice Methods

The standardized achievement test is, without doubt, the most important creation of educational psycholo­gists. It impacts most individuals throughout their life.

From kindergartners’ school readiness to examinations for entry to graduate programs, individuals are judged by marks on an answer sheet.

Test development begins with construct definition, typically in terms of behavioral objectives. For example, identify the topic sentence in a paragraph or calculate the sum of four 2-digit numbers in column format. Writing and revising items is the next step. The item stem poses the question, and the choices provide an­swers. One is correct and the others reflect degrees of wrongness. Plausible alternatives increase item diffi­culty. Scripted instructions determine test administra­tion, including time allocations, scoring information, and interpretation. Publishers conduct extensive trial runs, ensuring users of test reliability and providing normative data like averages and percentiles. They also offer scoring services. The teacher does little more than distribute booklets, read instructions, and package booklets for shipment.

Test development is only part of a larger enterprise. Test theorists and publishers rely on psychometric methods to transform scores from “percentage correct” to normative indicators like grade level equivalent, percentile, and normal curve equivalent. These indicators provide test users with general measures that compare individuals with a larger population. Classical psycho-metrics began with the normal “bell-shaped” curve but now employ a wide range of techniques, including fac­tor analysis, item-response theory, generalizability de­sign and analysis.

Criterion-referenced methods appeared in the 1950s as an alternative to normative indicators, the focus on whether students meet absolute and predefined standards. Standard setting begins with profes­sional judgment about performance levels, or what constitutes adequate and exceptional achievement. Tests remain the same, but scores are interpreted dif­ferently.

Standardized tests serve various purposes in schools. They compare students and schools. For ex­ample, students with high test scores are admitted to prestigious universities or may be identified in third grade as gifted. Parents search out schools with high achievement scores. School improvement and pro­gram effectiveness are gauged by standardized mea­sures; a few points up or down can leave educators celebrating or depressed.

Tests influence other facets of schooling. In the el­ementary grades, teachers monitor reading and math­ematics achievement by curriculum-embedded, end-of-unit tests .that mirror standardized instruments. Test batteries accompany high school and college textbooks, allowing instructors the convenience of cut-and-paste examinations, and publishers provide do-it-yourself manuals for constructing tests.

Performance-Based Assessment

Standardized tests have had critics from the outset. Crit­icisms are that tests are low-level, and differences due to socioeconomic status, ethnicity, and language back­ground signify test bias. Not until the 1970s did al­ternatives emerge, Movements like whole language, hands-on math, and discovery-based science militated against standardization and externally mandated tests, stressing instead the teacher’s role in adapting instruc­tion to student needs, and the validity of portfolios, ex­hibitions, and projects.

Performance is the distinctive feature that sets these methods apart from multiple-choice tests. The techniques span a wide range, At one end are on-demand writing tests; students have an hour or less to write a composition on a predetermined prompt, with no resources, no questions, no chance to revise. At the other extreme are free-form portfolios, collec­tions assembled over weeks or months to demonstrate learning. Individual students decide what to put in the folder and may even judge the quality of the col­lection.

How are performance samples evaluated? Olympic games like diving and gymnastics serve as metaphors. Judges confer about the characteristics of a quality performance and then evaluate each participant on rating scales or rubrics for specific performance (ana­lytic ratings) and for overall quality (holistic ratings). Psychometric techniques apply to some of these judg­ments; interrater agreement provides an index of con­sistency, for instance. Performance-based methods possess considerable face validity; students must di­rectly “do” what they have learned, rather than sim­ply select a correct answer. Trustworthiness is more problematic. Although raters can learn to make con­sistent judgments, students may look very different depending on the task.

Performance-based methods sprang from practice rather than policy, from classrooms rather than state houses, from teachers rather than publishers. They re­quire human judgment and are expensive. Nonetheless, substantial efforts are underway to adapt these meth­ods for large-scale assessment. On-demand writing tests are now commonplace. Several states complement multiple-choice tests with projects and portfolios, and Vermont relies entirely on these approaches. The de­mand for high standards provides continuing impetus for the use of performance assessment, the argument being that there is no substitute for demonstrating competence in complex and demanding tasks. For teachers, the connection to classroom practice is com­pelling, as is the opportunity to gauge student interest and motivation.

Current assessment practice varies from the primary grades through graduate school. Young students are just learning the school game. and standardized tests reflect early home preparation more than individual po­tential; performance assessments, therefore, are more appropriate. From the late elementary grades through entrance to postsecondary education, multiple-choice tests reach a peak. Afterward, performance samples, such as application letters, thesis papers, and disserta­tions, become critical.

Assessment for Categorical Placement

This topic does not fit under the previous headings but has become increasingly important because of govern­ment funding of categorical programs like special ed­ucation, Regulations govern assessment practices, but psychologists play important roles in setting local policy and actual implementation. Government funding for disadvantaged students depends on family characteris­tics like poverty more than achievement. Assessment is important as part of the debate about program effect­iveness accountability, that is, whether the investment is justified by student learning.

Categorical programs depend heavily on assessment for selection of students, determination of appropriate services, and exit to regular education. For these as­sessments, professionals (often psychologists) employ regulated (and expensive) clinical methods, combining teacher recommendations, standardized instruments, interviews and observations, and family consultations.

Teacher Assessment

Educational PsychologyOnly in recent decades has the evaluation of teachers emerged as a significant research topic. Assessment methods vary within levels of teacher development: ad­mission to preservice programs, initial licensure, and induction leading to tenure. The trend is to use stan­dardized procedures for entry-level decisions (e.g,. ad­mission to training programs) and performance-based methods for professional advancement decisions (e.g., tenure).

Because of concerns about applicant quality, college students planning to enter teaching must now dem­onstrate basic skills in many states. The multiple-choice tests resemble those given to high school students, with the same advantages and limitations. High failure rates by underrepresented minorities mean that many poten­tial teacher candidates are denied access to the field. The tests have been challenged as biased and unrelated to teaching potential; the counterargument is that every teacher should possess a minimum level of com­petence.

Following preservice preparation and during the first few years of service, teachers are in turn licensed and then inducted into tenure positions. During these steps, which most states regulate heavily, candidates undergo serious and sustained evaluation. Prior to 1990. the National Teacher Examination (NTE), a multiple-choice test covering teaching practices and content knowl­edge, often served for licensure. The NTE was criticized as lacking validity because it did not assess “real teach­ing.” In the late 1980s, Educational Testing Service in­troduced Praxis. a combination of computer-based tests of basic skills, paper-pencil exercises of subject-matter knowledge, and performance-based observations. Praxis has greater face validity and appears more closely linked to practice.

Professional preparation in teaching is “thin” com­pared with other fields. You can track the progress of doctors, nurses, lawyers, and accountants by certifi­cates on office walls, once a teacher has acquired ten­ure. However, opportunities for professional develop­ment are scarce and go unrecognized. In 1987, the National Board for Professional Teaching Standards was formed to develop and promote methods for as­sessing excellent teaching. Teachers desiring to move beyond initial licensure can now apply for an intensive experience composed of ten performance exercises; the teacher prepares six at the local school, and four are administered during a one-day session in an assessment center. The classroom exercises include instructional videotapes and student work samples, which the can­didate must analyze and interpret. At the assessment center, the candidate reviews prescribed lesson mate­rials and designs sample lessons. Panels of expert teachers rate each portfolio and award certificates of accomplishment. The standards are high, and pass rates have been modest. Some states now give certifi­cated teachers pay incentives, but the movement has yet to catch on.

Two final issues warrant brief mention. The first is reliance on student achievement as an indicator of teaching effectiveness. Teacher associations like the Na­tional Education Association and the American Feder­ation of Teachers oppose this policy, arguing that stu­dent scores reflect many factors the teacher cannot control. States increasingly hold schools responsible for achievement standards. Although the focus is the school, teachers share incentive payments for excep­tional school-wide performance and must deal with the consequences of low scores.

The second issue centers around teacher knowledge of assessment procedures. Externally mandated tests re­ceive most attention, but teachers also rely on their own observations and classroom assessments to judge student learning. How trustworthy are teacher judg­ments? How knowledgeable are they about standard­ized tests? Surveys show that teachers receive little preparation in assessment concepts and methods and typically rely on intuition and prepackaged methods. Some educators have proposed the concept of “assess­ment as inquiry” to support classroom-based methods like portfolios and exhibitions. but with little effect on practice thus far.

Administrator Assessment

Teacher evaluation has not captured the same atten­tion as student assessment but even less attention has been given to assessment of principals and superinten­dents. One might think that school leaders should be required to demonstrate their knowledge and skill, both to enter their positions and as part of continuing pro­fessional development. In fact, work in this area is sparse, with few contributions by psychologists. The research foundations are limited but are emerging around leadership concepts and practical needs.

Administrators typically attend more to budgets and personnel matters than to teaching and student learn­ing, except when schools stand out as exceptional or in dire straits. Research suggests that effective schools are correlated with strong administrative leadership; unfor­tunately, less is known about how to assess or support leadership. The criterion for effectiveness has typically been standardized student performance. Analogous to an assembly-line model, the administrator’s task is to increase the output. Newer models stress human rela­tions and organizational integrity but much remains to be done.

What Has Endured and What Is Valuable?

Educational PsychologyStandardized multiple-choice tests will remain most likely primary indicators of student achievement. Per­formance-based methods for large-scale accountability, a closer link between classroom assessment and local reporting of student achievement, and clinical strate­gies like the best of those found in categorical programs all offer alternative assessment models for the future. The new methods have stimulated public debate about the outcomes of schooling and about the trustworthi­ness of methods for judging the quality of educational programs. Equity issues are a significant element in these debates. Assessment data show that U.S. schools are doing reasonably well for students in affluent neigh­borhoods but are failing families in the inner cities and poor rural areas. Indicators can serve to blame victims or to guide improvements. We have much yet to learn about methods for supporting the second strategy.

Educational Psychology Bibliography:

  1. American Educational Research Association, American Psychological Association. & National Education Asso­ciation. (1985). Standards for educational and psychologi­cal testing. Washington. DC: Author.
  2. American Federation of Teachers, National Council on Measurement in Education. & National Education As­sociation. (1990). Standards for teacher competence in educational assessment of students. Educational Mea­surement: Issues and Practice, 9 (4), 30-32.
  3. Berliner, D, A.. & Calfee, R. C. (Eds.). (1996). Handbook of educational psychology, New York: Macmillan. Part 2 of the Handbook includes chapters on individual differ­ences among students, emphasizing a broad span of assessment concepts and practices in the achievement domain, along with motivation, attitudes, and apti­tudes, ranging from preschool through adulthood. Chapter 23 describes methods for teacher evaluation from selection through licensing and induction and on to professional certification, including descriptions of NBPTS and Praxis.
  4. Bloom, B. S., Hastings, J. T., & Madaus, G. F. (1971). Hand­book of formative and summative evaluation of student learning. New York: McGraw-Hill. A classic presentation of a broad range of testing and assessment methods based on behavioral principles that undergirded the de­sign of standardized tests, as well as many classroom and textbook assessments from the 1960s up through the present time.
  5. Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage. Describes in readable prose (except for a few technical asides) the concepts and methods underlying standardized tests, along with techniques for addressing problems of group bias.
  6. Candoli, I. C., Cullen, K., & Stufflebeam, D. L. (Eds.). (1997). Superintendent performance evaluation: Current practice and directions for improvement. Boston, MA: Kluwer. Part of a series that uses the Personnel evaluation standards as a foundation.
  7. Glaser, R., & Linn, R. (1997). Assessment in transition. Stan­ford, CA: National Academy of Education. The focus of this paperback is the National Assessment of Educational Progress, the “nation’s report card.” But the book also covers a broad range of issues in the assessment of student achievement in nontechnical language and sets the stage for discussions of state and national policy about how to find out how students are doing in our schools.
  8. Herman, J. L., Aschbacher, P. R.. & Winters, L. (1992). A practical guide to alternative assessment. Alexandria, VA: Association for Supervision and Curriculum Develop­ment.
  9. Joint Committee on Standards for Educational Evaluation. (1981). Standards for evaluations of educational programs, projects, and materials. New York: McGraw-Hill. Several organizations have established standards for educa­tional assessment practices. Implementation of the standards is voluntary in most instances, but the qual­ity of the recommendations is uniformly high.
  10. Linn, R. L. (Ed.). (1989). Educational Measurement (3rd ed.). New York: Macmillan. The technical foundations for measuring student achievement, based largely on multiple-choice tests. Although the techniques have broader applications, most of the examples assume “right-wrong” answers. The handbook covers validity and reliability, methods for scaling achievement, along with special chapters on cognitive psychology and mea­surement, computers and testing, and practical appli­cations of test scores.
  11. Mitchell, J. V, Jr., Wise, S. L., & Plake, B. S. (Eds.). Assess­ment of teaching: Purposes, practices, and implications for the profession. Hillsdale, NJ: Erlbaum. Describes a wide range of methods for assessing teaching knowledge and practice for selection and tenure decisions at local level, grounded in concept that improving education depends on improving teaching,
  12. Nettles, M. T., & Nettles, A. L. (Eds.). (1995). Equity and excellence in educational testing and assessment. Boston, MA: Kluwer.
  13. Office of Technology Assessment. (1992). Testing in Amer­ican schools: Asking the right questions. (OTA-SET-519). Washington DC: U.S. Government Printing Office. An up-to-date history of standardized testing.
  14. Phye, G. D. (Ed.). (1997). Handbook of classroom assessment: Learning, adjustment, and achievement. San Diego, CA: Academic Press.
  15. Richardson, V. (Ed.). Handbook of research on teaching (4th ed,). New York: Macmillan. This series offers an impor­tant historical perspective on evaluation of teachers and teachers’ evaluations of students. The first edition discusses various methods for studying teaching but does not connect these with evaluation per se. The second edition contains a chapter on assessment of teacher competence as well as a chapter on observation as a method for teacher evaluation. The third edition includes a chapter on the “measurement of teaching,” which describes relations between teacher activities and student performance on standardized tests.
  16. Shinkfield, A. J.. & Stufflebeam, D. L. (Eds.), (1995). Teacher evaluation: Guide to effective practice. Boston, MA: Kluwer Academic Publishers. Offers a review of current re­search along with practical suggestions.
  17. Stiggins, R. J. (1994). Student-centered classroom assessment. New York: Merrill.
  18. Wiggins, G. P. (1993). Assessing student performance. San Francisco: Jossey-Bass.

Primary Sidebar

Psychology Research and Reference

Psychology Research and Reference
  • Educational Psychology