American journal of epidemiology
-
Review
A critical look at methods for handling missing covariates in epidemiologic regression analyses.
Epidemiologic studies often encounter missing covariate values. While simple methods such as stratification on missing-data status, conditional-mean imputation, and complete-subject analysis are commonly employed for handling this problem, several studies have shown that these methods can be biased under reasonable circumstances. The authors review these results in the context of logistic regression and present simulation experiments showing the limitations of the methods. ⋯ While these methods are superior to simple methods, they are not commonly used in epidemiology, no doubt due to their complexity and the lack of packaged software to apply these methods. The authors contrast the results of multiple imputation to simple methods in the analysis of a case-control study of endometrial cancer, and they find a meaningful difference in results for age at menarche. In general, the authors recommend that epidemiologists avoid using the missing-indicator method and use more sophisticated methods whenever a large proportion of data are missing.