The field of statistics involves methods for describing and analyzing data and for making decisions or inferences about phenomena represented by the data. Methods in the first category are referred to as descriptive statistics and are used to summarize and organize data in an effective and meaningful way. Methods in the second category are called inferential statistics and are used to make decisions or inferences by interpreting data patterns.
Frequency DistributionFrequency distributions are used to examine the pattern of response to each of the independent and dependent variables under investigation. A frequency distribution of a single variable, sometimes referred to as a univariate frequency distribution, is the frequency of observations in each category of a variable.
Percentage distributions are a special type of frequency distribution that show what percentage of cases in a distribution occur in each category of a variable.
Using Graphs to Describe DistributionsGraphs provide researchers with an alternative method of displaying the information organized in frequency distributions. Three of the graphs researchers most commonly use are the pie chart, the bar chart, and the histogram. Both the pie chart and the bar chart can be used to present data measured at the nominal and ordinal levels. The histogram is used with interval or ratio levels.
Measures of Central TendencyMeasures of central tendency are statistical procedures that reflect a "typical" or an "average" characteristic of a frequency distribution. The three most commonly employed are the mode, the median, and the arithmetic mean.
The mode is the category or the observation that appears most frequently in a distribution. The median is a positional measure that divides the distribution into two equal parts. It is defined as the observation that is located halfway between the smallest and the largest observations in the distribution.
Percentiles are measures of location, indicating the point below which a given percentage of the values in a distribution fall. The median is a special case of a percentile: the 50th percentile.
The arithmetic mean is the measure of central tendency most frequently used and is defined as the sum total of all observations divided by the number of those observations.
Basic Measures of DispersionMeasures of central tendency identify the most representative value of the distribution, but a complete description of any distribution requires that the extent of dispersion about this central value be measured. There are various measures of dispersion: qualitative variation, range, mean deviation, variance, standard deviation, and coefficient of variation. The measure of qualitative variation indicates the number of differences among the categories of a distribution and is based upon the number of categories and their respective frequencies.
The range measures the distance between the highest and lowest values of the distribution. The interquartile range is the difference between the lower and upper quartiles. It measures the spread in the middle half of the distribution and is less affected by extreme observations.
Measures of Dispersion Based on the MeanThe simplest way to obtain a measure of deviation is to calculate the average deviation from the arithmetic mean, but the sum of the deviations from the mean is always equal to zero. Thus, the average deviation will always be zero. To compensate for this property of the mean, each deviation is squared in order to calculate the standard deviation-the measure of dispersion most commonly applied to interval level data. The standard deviation has various advantages: it is more stable from sample to sample, it has some important mathematical properties that enable the researcher to obtain the standard deviation for two or more groups combined, and its mathematical properties make it a useful measure in more advanced statistical work, especially in the area of statistical inferences.
Computation of the variance and standard deviation is similar to that of the mean deviation, except that instead of taking the deviation's absolute values, we square them, sum the squares, and then divide by the total number of observations. The variance indicates the average squared deviation of values from the mean in a distribution; the standard deviation is the square root of the variance.
The coefficient of variation reflects relative variation. It measures variation in a distribution relative to the mean of the distribution.
Types of Frequency DistributionsFrequency distributions can display different shapes. In symmetrical distributions, the mean will coincide with the median and the mode. In skewed distributions, there will be discrepancies between these measures. In a negatively skewed distribution, the mean will be pulled in the direction of the lower scores. In a positively skewed distribution, the mean will be located closer to the high scores.
One type of symmetrical distribution is the normal curve, wherein a fixed proportion of cases falls between the mean and any given point in the distribution.
Standard scores express the distance between a specific observation and the mean in terms of standard deviation units.