-
Pol. Arch. Med. Wewn. · May 2024
Practical use case of natural language processing for observational clinical research data retrieval from electronic health records: AssistMED project.
- Cezary Maciejewski, Krzysztof Ozierański, Mikołaj Basza, Adam Barwiołek, Michalina Ciurla, Aleksandra Bożym, Maciej J Krajsman, Piotr Lodziński, Grzegorz Opolski, Marcin Grabowski, Andrzej Cacko, and Paweł Balsam.
- First Department of Cardiology, Medical University of Warsaw, Warszawa, Poland
- Pol. Arch. Med. Wewn. 2024 May 28; 134 (5).
IntroductionElectronic health records (EHRs) contain data valuable for clinical research. However, they are in textual format and require manual encoding to databases, which is a lengthy and costly process. Natural language processing (NLP) is a computational technique that allows for text analysis.ObjectivesOur study aimed to demonstrate a practical use case of NLP for a large retrospective study cohort characterization and comparison with human retrieval.Patients And MethodsAnonymized discharge documentation of 10 314 patients from a cardiology tertiary care department was analyzed for inclusion in the CRAFT registry (Multicenter Experience in Atrial Fibrillation Patients Treated with Oral Anticoagulants; NCT02987062). Extensive clinical characteristics regarding concomitant diseases, medications, daily drug dosages, and echocardiography were collected manually and through NLP.ResultsThere were 3030 and 3029 patients identified by human and NLP‑based approaches, respectively, reflecting 99.93% accuracy of NLP in detecting AF. Comprehensive baseline patient characteristics by NLP was faster than human analysis (3 h and 15 min vs 71 h and 12 min). The calculated CHA2DS2VASc and HAS‑BLED scores based on both methods did not differ (human vs NLP; median [interquartile range], 3 [2-5] vs 3 [2-5]; P = 0.74 and 1 [1-2] vs 1 [1-2]; P = 0.63, respectively). For most data, an almost perfect agreement between NLP- and human-retrieved characteristics was found; daily dosage identification was the least accurate NLP feature. Similar conclusions on cohort characteristics would be made; however, daily dosage detection for some drug groups would require additional human validation in the NLP‑based cohort.ConclusionsNLP utilization in EHRs may accelerate data acquisition and provide accurate information for retrospective studies.
Notes
Knowledge, pearl, summary or comment to share?You can also include formatting, links, images and footnotes in your notes
- Simple formatting can be added to notes, such as
*italics*
,_underline_
or**bold**
. - Superscript can be denoted by
<sup>text</sup>
and subscript<sub>text</sub>
. - Numbered or bulleted lists can be created using either numbered lines
1. 2. 3.
, hyphens-
or asterisks*
. - Links can be included with:
[my link to pubmed](http://pubmed.com)
- Images can be included with:
![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")
- For footnotes use
[^1](This is a footnote.)
inline. - Or use an inline reference
[^1]
to refer to a longer footnote elseweher in the document[^1]: This is a long footnote.
.