• Am J Emerg Med · Dec 2024

    Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.

    • B Arslan, C Nuhoglu, M O Satici, and E Altinbilek.
    • Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey. Electronic address: dr.banuarslan@gmail.com.
    • Am J Emerg Med. 2024 Dec 19; 89: 174181174-181.

    BackgroundThe number of emergency department (ED) visits has been on steady increase globally. Artificial Intelligence (AI) technologies, including Large Language Model (LLMs)-based generative AI models, have shown promise in improving triage accuracy. This study evaluates the performance of ChatGPT and Copilot in triage at a high-volume urban hospital, hypothesizing that these tools can match trained physicians' accuracy and reduce human bias amidst ED crowding challenges.MethodsThis single-center, prospective observational study was conducted in an urban ED over one week. Adult patients were enrolled through random 24-h intervals. Exclusions included minors, trauma cases, and incomplete data. Triage nurses assessed patients while an emergency medicine (EM) physician documented clinical vignettes and assigned emergency severity index (ESI) levels. These vignettes were then introduced to ChatGPT and Copilot for comparison with the triage nurse's decision.ResultsThe overall triage accuracy was 65.2 % for nurses, 66.5 % for ChatGPT, and 61.8 % for Copilot, with no significant difference (p = 0.000). Moderate agreement was observed between the EM physician and ChatGPT, triage nurses, and Copilot (Cohen's Kappa = 0.537, 0.477, and 0.472, respectively). In recognizing high-acuity patients, ChatGPT and Copilot outperformed triage nurses (87.8 % and 85.7 % versus 32.7 %, respectively). Compared to ChatGPT and Copilot, nurses significantly under-triaged patients (p < 0.05). The analysis of predictive performance for ChatGPT, Copilot, and triage nurses demonstrated varying discrimination abilities across ESI levels, all of which were statistically significant (p < 0.05). ChatGPT and Copilot exhibited consistent accuracy across age, gender, and admission time, whereas triage nurses were more likely to mistriage patients under 45 years old.ConclusionChatGPT and Copilot outperform traditional nurse triage in identifying high-acuity patients, but real-time ED capacity data is crucial to prevent overcrowding and ensure high-quality of emergency care.Copyright © 2024 Elsevier Inc. All rights reserved.

      Pubmed     Copy Citation     Plaintext  

      Add institutional full text...

    Notes

     
    Knowledge, pearl, summary or comment to share?
    300 characters remaining
    help        
    You can also include formatting, links, images and footnotes in your notes
    • Simple formatting can be added to notes, such as *italics*, _underline_ or **bold**.
    • Superscript can be denoted by <sup>text</sup> and subscript <sub>text</sub>.
    • Numbered or bulleted lists can be created using either numbered lines 1. 2. 3., hyphens - or asterisks *.
    • Links can be included with: [my link to pubmed](http://pubmed.com)
    • Images can be included with: ![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")
    • For footnotes use [^1](This is a footnote.) inline.
    • Or use an inline reference [^1] to refer to a longer footnote elseweher in the document [^1]: This is a long footnote..

    hide…

Want more great medical articles?

Keep up to date with a free trial of metajournal, personalized for your practice.
1,694,794 articles already indexed!

We guarantee your privacy. Your email address will not be shared.