Repeatability, reproducibility, and diagnostic accuracy of a

CJEM · Jan 2024

Repeatability, reproducibility, and diagnostic accuracy of a commercial large language model (ChatGPT) to perform emergency department triage using the Canadian triage and acuity scale.

The release of the ChatGPT prototype to the public in November 2022 drastically reduced the barrier to using artificial intelligence by allowing easy access to a large language model with only a simple web interface. One situation where ChatGPT could be useful is in triaging patients arriving to the emergency department. This study aimed to address the research problem: "can emergency physicians use ChatGPT to accurately triage patients using the Canadian Triage and Acuity Scale (CTAS)?". ⋯ This study suggests that the current ChatGPT large language model is not sufficient for emergency physicians to triage simulated patients using the Canadian Triage and Acuity Scale due to poor repeatability and accuracy. Medical practitioners should be aware that while ChatGPT can be a valuable tool, it may lack consistency and may frequently provide false information.

keep reading… mark as read…
- Jeffrey Michael Franc, Lenard Cheng, Alexander Hart, Ryan Hata, and Atilla Hertelendy.
- Department of Emergency Medicine, University of Alberta, Edmonton, AB, Canada. jeffrey.franc@ualberta.ca.
- CJEM. 2024 Jan 1; 26 (1): 404640-46.
PurposeThe release of the ChatGPT prototype to the public in November 2022 drastically reduced the barrier to using artificial intelligence by allowing easy access to a large language model with only a simple web interface. One situation where ChatGPT could be useful is in triaging patients arriving to the emergency department. This study aimed to address the research problem: "can emergency physicians use ChatGPT to accurately triage patients using the Canadian Triage and Acuity Scale (CTAS)?".MethodsSix unique prompts were developed independently by five emergency physicians. An automated script was used to query ChatGPT with each of the 6 prompts combined with 61 validated and previously published patient vignettes. Thirty repetitions of each combination were performed for a total of 10,980 simulated triages.ResultsIn 99.6% of 10,980 queries, a CTAS score was returned. However, there was considerable variations in results. Repeatability (use of the same prompt repeatedly) was responsible for 21.0% of overall variation. Reproducibility (use of different prompts) was responsible for 4.0% of overall variation. Overall accuracy of ChatGPT to triage simulated patients was 47.5% with a 13.7% under-triage rate and a 38.7% over-triage rate. More extensively detailed text given as a prompt was associated with greater reproducibility, but minimal increase in accuracy.ConclusionsThis study suggests that the current ChatGPT large language model is not sufficient for emergency physicians to triage simulated patients using the Canadian Triage and Acuity Scale due to poor repeatability and accuracy. Medical practitioners should be aware that while ChatGPT can be a valuable tool, it may lack consistency and may frequently provide false information.© 2024. The Author(s), under exclusive licence to Canadian Association of Emergency Physicians (CAEP)/ Association Canadienne de Médecine d'Urgence (ACMU).

Pubmed Copy Citation Plaintext

Add institutional full text...
Notes
Knowledge, pearl, summary or comment to share?

300 characters remaining

help

You can also include formatting, links, images and footnotes in your notes

Simple formatting can be added to notes, such as *italics*, _underline_ or **bold**.

Superscript can be denoted by <sup>text</sup> and subscript <sub>text</sub>.

Numbered or bulleted lists can be created using either numbered lines 1. 2. 3., hyphens - or asterisks *.

Links can be included with: [my link to pubmed](http://pubmed.com)

Images can be included with: ![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")

For footnotes use [^1](This is a footnote.) inline.

Or use an inline reference [^1] to refer to a longer footnote elseweher in the document [^1]: This is a long footnote..
hide…

Repeatability, reproducibility, and diagnostic accuracy of a commercial large language model (ChatGPT) to perform emergency department triage using the Canadian triage and acuity scale.

Notes

300 characters remaining

help

You can also include formatting, links, images and footnotes in your notes

Want more great medical articles?

Keep up to date with a free trial of metajournal, personalized for your practice.
1,704,841 articles already indexed!

Repeatability, reproducibility, and diagnostic accuracy of a commercial large language model (ChatGPT) to perform emergency department triage using the Canadian triage and acuity scale.

Notes

300 characters remaining

help

You can also include formatting, links, images and footnotes in your notes

Want more great medical articles?

Keep up to date with a free trial of metajournal, personalized for your practice.1,704,841 articles already indexed!

Keep up to date with a free trial of metajournal, personalized for your practice.
1,704,841 articles already indexed!