MAIN POINTS

The Concept of Relationship

A basic form of analysis in the social sciences involves the examination of relationships-whether certain categories or values of one variable are more or less likely to occur in common with certain values or categories of a second variable.

The first step in examining a relationship between two variables is the construction of a bivariate table. A bivariate table is one in which two variables have been cross classified; such tables are used in the examination and presentation of relationships between variables. A generally useful way of summarizing a bivariate table and comparing its univariate distributions to assess relationship is by expressing its frequencies as percentages. Percentaging tables is appropriate whenever the variables are nominal, but the use of percentages is predominant even when the variables being analyzed are ordinal or interval.

There are various statistical techniques that allow researchers to assess the extent to which two variables are associated by a single summarizing measure. Such measures of relationship, often referred to as correlation coefficients, reflect the strength and direction or association between the variables and the degree to which one variable can be predicted from the other. The notion of prediction is inherent in the concept of covariation. When two variables covary, it is possible to use one to predict the other; when they do not, information about one will not enable the prediction of the other.

The degree to which two variables are related may be assessed in terms of the extent to which knowing values of one variable (the independent variable) allows researchers to increase the accuracy with which the other (dependent) variable can be predicted, compared with predictions made without knowing values of the independent variable. This is known as the principle of proportional reduction of error, and a number of commonly used measures of association are based upon this principle.

Nominal Measures of Relationship

Measures of association must be selected on the basis of the level of measurement reflected in the data under analysis. Lambda, or the Guttman coefficient of predictability, is an asymmetrical coefficient, as it reflects relationships between variables in one direction only.

Lambda has a limitation in situations where the modal frequencies of the independent variable are all concentrated in one category of the dependent variable.

Ordinal Measures of Relationship

Several measures of relationship for ordinal data are based on the concepts of same order pairs, different-order or inverse-order pairs, and tied pairs.

Gamma is a measure of the preponderance of same order or different order pairs among nontied pairs in a bivariate table of ordinal data.

Kendall's tau b takes into account pairs tied on either variable, thereby correcting a deficiency of gamma. However, tau b does not have such a clear cut interpretation as gamma.

Interval Measures of Relationship

Measures of relationship for interval data are more precise than for lower levels of measurement, because they are based on actual numeric values rather than on membership in categories.

The relationship between two linear variables is frequently examined and expressed by the use of linear regression. The task of regression is to find some algebraic expression by which to represent the functional relationship between the variables. The equation Y = a + bX is a linear regression equation, meaning the function describing the relation between the independent variable (X) and the dependent variable (Y) is a straight line. The regression equation, however, is only a prediction rule; thus, there are discrepancies between actual observations and the ones predicted. The goal is to construct such an equation that the deviations, or error of prediction, will be at a minimum. If a specific criterion is adopted in determining a and b of the linear equation, it is possible to create a function that will minimize the variance around the regression line. This is the criterion of least squares, which minimizes the sum of the squared differences between the observed Y's and the Y's predicted with the regression equation.

Pearson's product-moment correlation coefficient (r) provides a single statistic that describes the strength and direction of the relationship between two interval variables. The square of r can be interpreted as the proportional reduction in prediction error afforded by the relationship between two variables. It is therefore comparable to several other measures discussed in this chapter.