Performance of ChatGPT on NASS Clinical Guidelines for the

Spine · May 2024

Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain: A Comparison Study.

Comparative analysis. ⋯ ChatGPT was able to produce a sufficient clinical guideline recommendation for low back pain, with overall improvements if initially prompted. However, it tended to wrongly suggest evidence and often failed to mention, especially postprompt, when there is not enough evidence to adequately give an accurate recommendation.

explore further… or not…
- Nancy Shrestha, Zekun Shen, Bashar Zaidat, Akiro H Duey, Justin E Tang, Wasil Ahmed, Timothy Hoang, Mateo Restrepo Mejia, Rami Rajjoub, Jonathan S Markowitz, Jun S Kim, and Samuel K Cho.
- Chicago Medical School at Rosalind Franklin University, North Chicago, IL.
- Spine. 2024 May 1; 49 (9): 640651640-651.
Study DesignComparative analysis.ObjectiveTo evaluate Chat Generative Pre-trained Transformer (ChatGPT's) ability to predict appropriate clinical recommendations based on the most recent clinical guidelines for the diagnosis and treatment of low back pain.BackgroundLow back pain is a very common and often debilitating condition that affects many people globally. ChatGPT is an artificial intelligence model that may be able to generate recommendations for low back pain.Materials And MethodsUsing the North American Spine Society Evidence-Based Clinical Guidelines as the gold standard, 82 clinical questions relating to low back pain were entered into ChatGPT (GPT-3.5) independently. For each question, we recorded ChatGPT's answer, then used a point-answer system-the point being the guideline recommendation and the answer being ChatGPT's response-and asked ChatGPT if the point was mentioned in the answer to assess for accuracy. This response accuracy was repeated with one caveat-a prior prompt is given in ChatGPT to answer as an experienced orthopedic surgeon-for each question by guideline category. A two-sample proportion z test was used to assess any differences between the preprompt and postprompt scenarios with alpha=0.05.ResultsChatGPT's response was accurate 65% (72% postprompt, P =0.41) for guidelines with clinical recommendations, 46% (58% postprompt, P =0.11) for guidelines with insufficient or conflicting data, and 49% (16% postprompt, P =0.003*) for guidelines with no adequate study to address the clinical question. For guidelines with insufficient or conflicting data, 44% (25% postprompt, P =0.01*) of ChatGPT responses wrongly suggested that sufficient evidence existed.ConclusionChatGPT was able to produce a sufficient clinical guideline recommendation for low back pain, with overall improvements if initially prompted. However, it tended to wrongly suggest evidence and often failed to mention, especially postprompt, when there is not enough evidence to adequately give an accurate recommendation.Copyright © 2024 Wolters Kluwer Health, Inc. All rights reserved.

Pubmed Copy Citation Plaintext

Add institutional full text...
Notes
Knowledge, pearl, summary or comment to share?

300 characters remaining

help

You can also include formatting, links, images and footnotes in your notes

Simple formatting can be added to notes, such as *italics*, _underline_ or **bold**.

Superscript can be denoted by <sup>text</sup> and subscript <sub>text</sub>.

Numbered or bulleted lists can be created using either numbered lines 1. 2. 3., hyphens - or asterisks *.

Links can be included with: [my link to pubmed](http://pubmed.com)

Images can be included with: ![alt text](https://bestmedicaljournal.com/study_graph.jpg "Image Title Text")

For footnotes use [^1](This is a footnote.) inline.

Or use an inline reference [^1] to refer to a longer footnote elseweher in the document [^1]: This is a long footnote..
hide…

Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain: A Comparison Study.

Notes

300 characters remaining

help

You can also include formatting, links, images and footnotes in your notes