• Plos One · Jan 2016

    Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis.

    • Shang-Ming Zhou, Fabiola Fernandez-Gutierrez, Jonathan Kennedy, Roxanne Cooksey, Mark Atkinson, Spiros Denaxas, Stefan Siebert, William G Dixon, Terence W O'Neill, Ernest Choy, Cathie Sudlow, UK Biobank Follow-up and Outcomes Group, and Sinead Brophy.
    • Institute of Life Science, College of Medicine, Swansea University, Swansea, United Kingdom.
    • Plos One. 2016 Jan 1; 11 (5): e0154515.

    Objectives1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs.MethodsThis study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge.ResultsPrimary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods.ConclusionData-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs.

      Pubmed     Free full text   Copy Citation     Plaintext  

      Add institutional full text...

    Notes

     
    Knowledge, pearl, summary or comment to share?
    300 characters remaining
    help        
    You can also include formatting, links, images and footnotes in your notes
    • Simple formatting can be added to notes, such as *italics*, _underline_ or **bold**.
    • Superscript can be denoted by <sup>text</sup> and subscript <sub>text</sub>.
    • Numbered or bulleted lists can be created using either numbered lines 1. 2. 3., hyphens - or asterisks *.
    • Links can be included with: [my link to pubmed](http://pubmed.com)
    • Images can be included with: ![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")
    • For footnotes use [^1](This is a footnote.) inline.
    • Or use an inline reference [^1] to refer to a longer footnote elseweher in the document [^1]: This is a long footnote..

    hide…

Want more great medical articles?

Keep up to date with a free trial of metajournal, personalized for your practice.
1,694,794 articles already indexed!

We guarantee your privacy. Your email address will not be shared.