The second type of data cleaning is more complicated. The logical structure of the data may place special limits on the responses of certain respondents. Contingency cleaning is the process of checking that only those cases that should have data on a particular variable do in fact have such data. For example, a questionnaire may ask for the number of children that women have had. All female respondents, then, should have a response coded (or a special code for failure to answer), but no male respondent should have an answer recorded (or should only have a special code indicating) the question is inappropriate to him). If a given male respondent is coded as having borne three children, either an error has been made and should be corrected or your study is about to become more famous than you ever dreamed.
Although data cleaning is an essential step in data processing, you can safely avoid it in certain cases. Perhaps you’ll feel you can exclude the very few errors that appear in a given item- if the exclusion of those cases will not significantly affect your results. Or, some inappropriate contingency responses may be safely ignored. If some men have been accorded motherhood status in the coding process, you can limit your analysis of this variable to women. However, you should not use these comments as rationalizations for sloppy research. “Dirty” data will almost always produce misleading research findings.