Sampling is a technique utilized by researchers to make inferences about a large body of data (population) on the basis of a smaller body of sample data. A particular value of the population is called a parameter; its counterpart in the sample is termed a statistic.
The major objective of sampling theory is to provide accurate estimates of unknown parameters from sample statistics that can be easily calculated. In order to estimate accurately unknown parameters from known statistics, three major problems must be dealt with effectively: 1) the definition of the population, 2) the sample design, and 3) the size of the sample.
PopulationA population is the "aggregate of all cases that conform to some designated set of specifications." A sampling unit is a member of a population about which we wish to investigate. A sample may be drawn from an infinite or from a finite population. Sampling designed to produce information about particular characteristics of a finite population is called survey sampling and is typical of social research.
Once the population has been defined, a sample that adequately represents the population may be drawn. The actual procedures involve selecting a sample from a complete list of sampling units. The list of the sampling units used to select the sample is called a sampling frame. Typical problems in sampling frames include 1) incomplete frames (some units are missing), 2) clusters of elements (groups of units are listed rather than individual units), and 3) blank foreign elements (some units on the list should not be in the population).
Sample DesignsThe essential requirement of any sample is that it be as representative as possible of the population from which it is drawn.
In modern sampling theory, a basic distinction is made between probability and nonprobability sampling. The distinguishing characteristic of probability sampling is that one can specify for each sampling unit of the population the probability that it will be included in the sample. In nonprobability sampling, there is no way of specifying the probability that each unit has of being included in the sample, and there is no assurance that every unit has some chance of being included.
Social scientists do employ nonprobability samples, even though accurate estimates of the population's parameters can be had only through probability samples. Nonprobability samples are utilized when convenience and economy are demanded, when a sampling population cannot be precisely defined, and when a list of the sampling population is unavailable. Three major designs of nonprobability samples have been used by social scientists: convenience samples (researcher selects whatever sampling units are conveniently available); purposive samples (researcher selects sampling units that, in his or her judgment, are representative of the population); and quota samples (selection of a sample that is as similar as possible to the sampling population).
Probability sample designs include simple random sampling, systematic sampling, stratified sampling, and cluster sampling.
Simple random sampling is the basic probability sampling design and is incorporated into all the more elaborate probability designs. This technique involves numbering all population elements and then selecting sufficient random numbers to compile a sample of desired size; it is simple but is inconvenient to implement with large populations.
Systematic sampling consists of selecting every Kth sampling unit of the population after the first sampling unit is selected at random from the first K sampling units. Systematic sampling is more convenient than simple random sampling, especially when a population is very large or when large samples are to be selected.
Stratified sampling is used primarily to ensure that different groups of a population are adequately represented in the sample, so that the level of accuracy in estimating parameters is increased; it reduces the cost of execution considerably. Sampling from the different strata can be either proportional (the number of elements selected from each stratum is proportional to that stratum's representation in the population) or disproportional (sometimes chosen in order to yield sufficient numbers in a stratum to allow intensive analysis of that particular stratum).
Cluster sampling is frequently used in large scale studies because it is the least expensive sample design; it involves first the selection of larger groupings, called clusters, and then the selection of the sampling units from the clusters. The clusters are selected by a simple random sample or a stratified sample. The choice of clusters depends upon the research objectives and the resources available for the study.
Sample SizeA sample is any subset of sampling units from a population. A subset is any combination of sampling units that does not include the entire set of sampling units that has been defined as the population. The idea of standard error is central to sampling theory and to the understanding of how to determine the size of a sample. When an infinite number of independently selected sample values (such as means) are placed in a distribution, the resulting distribution is called the sampling distribution of the mean, and its standard deviation is the standard error. The mean of all the sample values in a sampling distribution is an unbiased estimate of the population value, and the standard error allows the researcher to determine the probability that a given sample estimate is close to the actual population value.
Assuming that a distribution of sample statistics is approximately normal, researchers can use the standard deviation of the normal curve to place a confidence interval around their sample statistic.
Nonsampling ErrorsSampling theory is concerned with the error introduced by the sampling procedure. In a perfect design, this error is minimized for an individual sample. The error in estimates refers to what is expected in the long run if a particular set of procedures is followed. However, even if the sampling error is minimized, there are other sources of error. In survey research, the most pervasive error is the nonresponse error: those observations that are not carried out because of refusal to answer, not at homes, lost forms, and so on. Nonresponse can introduce a substantial bias into the findings.