International journal of medical informatics
-
The proliferation and use of electronic medical records (EMR) in the clinical setting now provide a rich source of clinical data that can be leveraged to support research on patient outcomes, comparative effectiveness, and health systems research. Once the large volume and variety of data that robust clinical EMRs provide is aggregated, the suitability of the data for research purposes must be addressed. Therefore, the purpose of this paper is two-fold. First, we present a stepwise framework capable of guiding initial data quality assessment when matching multiple data sources regardless of context or application. Then, we demonstrate a use case of initial analysis of a longitudinal data repository of electronic health record data that illustrates the first four steps of the framework, and report results. ⋯ The proposed six-step data quality assessment framework is useful in establishing the metadata for a longitudinal data repository that can be replicated by other studies. There are practical issues that need to be addressed including the data quality assessments-with the most prescient being the need to establish data quality metrics for benchmarking acceptable levels of EMR data inclusiveness through testing and application.