Descriptive Statistics

Measurement provides a means for quantifying important phenomena of interest. In many measurement contexts, researchers are interested solely in efficiently describing the data. Descriptive statistics are the indexes through which such data summarization may be accomplished. Unlike contexts in which the researcher is interested in drawing generalizations from data, descriptive statistics are not used to draw inferences to a larger population.

For example, consider a situation in which a teacher has recently administered an examination. Fundamentally, the teacher has constructed the test in the hope that those with more knowledge of the course material will perform better (receive higher scores) than those with less knowledge of the course material. After scoring the examinations, the teacher records each student’s grade in the instructor’s grade book, which is organized alphabetically. Although these data can be easily used to identify the performance of any individual student, the alphabetic organization does not provide for easy interpretation of scores as a whole.

One way to use descriptive statistics to make sense of these data would be to construct a frequency table in which scores are rank ordered from largest to smallest along with the number of individuals receiving each score. This information can also be presented graphically in the form of a frequency distribution in which test score is plotted along the x-axis and frequency is plotted along the y-axis. An examination of such representations of these data can quickly reveal the general shape (e.g., normal, bimodal, skewed) of the distribution of observed scores.

Although graphic representations of data are useful, interpretation of such representations is largely qualitative. Thus, quantitative descriptions are often used to understand the characteristics of a set of data. Frequency distributions are typically quantitatively described along two dimensions: central tendency and variability. Central tendency refers to the one score that best captures the center of the observed distribution of scores and can be described by three alternative indexes: mode, median, and mean. Importantly, these values will be identical only when the observed data are normally distributed. As a result, choice of which statistic to interpret should be made based on the scale of measurement and the ultimate use of the data.

Similar to central tendency, variability may be assessed through several statistics, including the range, variance, and standard deviation. The range of a set of scores represents the difference between the highest and lowest scores. The variance quantifies the magnitude of differences between scores through squared deviations, and the standard deviation is computed as the square root of the variance and can be interpreted as an index of variability of individual scores from the mean.

Other descriptive statistics that can be used to describe a set of data quantitatively are skew and kurtosis. Skew indicates the extent to which observed scores occur more frequently at the extreme of the distribution. Positively skewed data are those in which most scores fall at the lower end of the test score scale, and negatively skewed data are characterized by high scores occurring more frequently. Kurtosis essentially describes the length of the tails of a frequency distribution. Platykurtic data are those for which the tails are shorter than what would be expected had the data been normal, and leptokurtic data are those for which the tails are longer than would be expected with normally distributed data.

Descriptive data are not confined to use with univariate data. In many cases, for example, there is interest in the extent to which variability associated with one measure is associated with variability in a second measure. As with univariate data, descriptive statistics can be used to both graphically and quantitatively evaluate this question. Graphically, this relationship can be depicted in a joint distribution table or through a scatter plot in which scores for both variables are plotted in two-dimensional space. Because interpretation of these graphic depictions is largely subjective, the correlation coefficient is typically computed as an index that quantifies the observed relationship.

References:

  1. Gelman, A., & Nolan, D. (2002). Teaching statistics: A bag of tricks. Oxford, UK: Oxford University Press.
  2. Kranzler, G., & Moursund, J. (1999). Statistics for the terrified (2nd ed.). Upper Saddle River, NJ: Prentice Hall.