-
J. Thorac. Cardiovasc. Surg. · Nov 2024
How Soon Will Surgeons Become Mere Technicians? Chatbot Performance in Managing Clinical Scenarios.
- Darren S Bryan, Joseph J Platz, Keith S Naunheim, Mark K Ferguson, and Research in Artificial Intelligence Development for Surgery (RAIDS) Working Group.
- Department of Surgery, University of Chicago, Chicago, IL. Electronic address: dbryan@uchicago.edu.
- J. Thorac. Cardiovasc. Surg. 2024 Nov 11.
ObjectiveChatbot use has developed a presence in medicine and surgery and has been proposed to help guide clinical decision making. However, the accuracy of information provided by artificial intelligence (AI) platforms has been questioned. We evaluated the performance of 4 popular chatbots on a board-style examination and compared results with a group of board-certified thoracic surgeons.MethodsClinical scenarios were developed within domains based on the ABTS Qualifying Exam. Each scenario included three stems written with the Key Feature methodology related to diagnosis, evaluation, and treatment. Ten scenarios were presented to ChatGPT-4, Bard (now Gemini), Perplexity, and Claude 2, as well as randomly selected ABTS-certified surgeons. The maximum possible score was 3 points per scenario. Critical failures were identified during exam development; if they occurred in any of the 3 stems the entire question received a score of 0. The Mann-Whitney U test was used to compare surgeon and chatbot scores.ResultsExaminations were completed by 21 surgeons, the majority of whom (14; 66%) practiced in academic or university settings. The median score per scenario for chatbots was 1.06compared to 1.88 for surgeons (difference 0.66, p=0.019). Surgeon median scores were better than chatbot median scores for all except two scenarios. Chatbot answers were significantly more likely to be deemed critical failures compared to those provided by surgeons (median 0.50 per chatbot/scenario vs. 0.19 per surgeon/scenario; p=0.016).ConclusionsFour popular chatbots performed at a significantly lower level than board-certified surgeons. Implementation of AI should be undertaken with caution in clinical decision making.Copyright © 2024. Published by Elsevier Inc.
Notes
Knowledge, pearl, summary or comment to share?You can also include formatting, links, images and footnotes in your notes
- Simple formatting can be added to notes, such as
*italics*
,_underline_
or**bold**
. - Superscript can be denoted by
<sup>text</sup>
and subscript<sub>text</sub>
. - Numbered or bulleted lists can be created using either numbered lines
1. 2. 3.
, hyphens-
or asterisks*
. - Links can be included with:
[my link to pubmed](http://pubmed.com)
- Images can be included with:
![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")
- For footnotes use
[^1](This is a footnote.)
inline. - Or use an inline reference
[^1]
to refer to a longer footnote elseweher in the document[^1]: This is a long footnote.
.