MAIN POINTS

Coding Schemes

The number assigned to an observation is called a code. Systems of categories used to classify responses or acts are referred to as coding schemes. The main purpose of coding is to simplify the handling of many individual responses by classifying them into a smaller number of groups, each including responses that are similar in content.

The initial rule of coding is that the numbers assigned must make intuitive sense. Once intuitive sense has been satisfied, then linkage to theory, mutual exclusivity, exhaustiveness, and detail must be factored into coding decisions. Linkage to theory assumes that the researcher has some idea, from the literature, what types of responses to expect from a respondent. Theory can be used to construct response categories before the instrument is administered. Thus, deductive coding can be used; the respondents or those who administer the instrument can classify their responses in preestablished categories, as is the case with closed-ended questions. When a study is exploratory or when there is little theory informing the researcher about which responses to expect, inductive coding may be appropriate.

Codebook Construction

Once a coding scheme has been developed for each of the variables used in a research project, this information should be compiled in a codebook. Once the codebook is constructed, the data need to be coded or transferred into a form in which these data will be stored and analyzed. Studies with a well-constructed codebook experience fewer problems involving reliability. Coding reliability is increased by keeping coding schemes simple and by training coders thoroughly. Coding devices include transfer sheets, edge coding, and direct data entry. Transfer sheets are paper representations of keypunch cards to record data in the columns specified by the codebook, and keypunchers then transferred the data to the cards. Today, coders use spreadsheet forms to organize cases in the rows and values of the variables across the columns, with data entry personnel quickly inputting the data into the spreadsheet. Using transfer sheets has multiple handling of the data, which increases the risk of miscoding. Edge coding is used to eliminate the need for transfer sheets by transferring questionnaire information directly onto spaces at the outside edge of the instrument. Reliability is enhanced because the coders' eyes do not have to leave the instrument, and they do not have to keep close track of column positions. There are two forms of direct data coding: coding from a questionnaire and coding by telephone interviewing. Computer-assisted telephone interviewing (CATI) and computer-assisted personal interviewing (CAPA) are highly sophisticated systems that greatly reduces miscoding. Editing and cleaning the data are important steps in data processing that should always precede analysis of the collected information. Data editing occurs both during and after the coding phase. Data cleaning is the proofreading of the data to catch and correct errors and inconsistent codes.

Computing in Social Science Research

Data preparation and analysis is now typically handled by statistical software and other workflow programs. The text mainly highlights the benefits of SPSS, but some other programs that are widely used include SAS and STATA. However, these programs are limited in their accessibility because they are not free. The most notable free program is the R Project for Statistical Consulting. Finally, one of the more interesting developments in recent years has been that of Massive Open Online Courses (MOOCs). MOOCs are online courses, in many cases free, that are taught by professors and scholars at many universities and colleges around the world.