Internal Medicine Practice

Evaluating AI Chatbot Performance Against Physicians in Providing Patient Responses

Article Impact Level: HIGH
Data Quality: STRONG
Summary of JAMA Internal Medicine, 183(6), 589. https://doi.org/10.1001/jamainternmed.2023.1838
Dr. John W. Ayers et al.

Points

  • A cross-sectional study evaluated quality and empathy by comparing AI chatbot (ChatGPT) responses to physician responses for 195 patient questions from Reddit’s r/AskDocs.
  • The chatbot’s responses were significantly longer (mean 211 words) than physicians’ (mean 52 words) and preferred in 78.6% of evaluations.
  • Chatbot responses were rated higher in quality, with 78.5% judged as good or very good, compared to 22.1% for physician responses.
  • Empathy ratings showed chatbot responses were deemed empathetic or very empathetic 45.1% of the time versus 4.6% for physicians.
  • The study suggests that AI chatbots like ChatGPT could enhance patient care by drafting high-quality, empathetic responses for clinicians to review, potentially reducing workload and improving patient interactions.

Summary

In a recent cross-sectional study, researchers assessed the capabilities of an artificial intelligence (AI) chatbot, specifically ChatGPT, to produce quality and empathetic responses to patient questions compared with responses generated by physicians. This evaluation involved 195 patient questions randomly selected from Reddit’s r/AskDocs, where a verified physician had previously answered each question. The chatbot’s responses were generated afresh for each question on specific dates in December 2022, and both the physicians’ and the chatbot’s responses were anonymized and evaluated by a team of licensed healthcare professionals. The evaluators ranked the responses based on the quality of information and the level of empathy, using a 1 to 5 scale.

The findings indicated a strong preference for the chatbot’s responses over those from the physicians, with the chatbot’s replies preferred in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations conducted. In terms of length and content quality, chatbot responses were significantly longer (mean [IQR]: 211 [168-245] words) compared to those from physicians (52 [17-62] words; t = 25.4; P < .001). Moreover, chatbot responses were judged to be of higher quality, with 78.5% rated as good or very good (95% CI, 72.3%-84.1%), compared to just 22.1% for physicians (95% CI, 16.4%-28.2%). This translates to a 3.6 times higher prevalence of high-quality responses from the chatbot.

Empathy ratings further emphasized the superiority of the chatbot, with responses judged as empathetic or very empathetic at a rate of 45.1% (95% CI, 38.5%-51.8%) for the chatbot, versus only 4.6% for physicians (95% CI, 2.1%-7.7%). This represents a 9.8 times higher prevalence of empathetic responses from the chatbot. These results suggest that AI chatbots like ChatGPT could significantly augment the patient care process by initially drafting responses that clinicians could review or edit, potentially reducing workload and enhancing the quality of patient interactions in healthcare settings. Further studies, including randomized trials, are warranted to explore the broader application of AI in clinical communications and its impact on clinician efficiency and patient satisfaction.

Link to the article: https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2804309


References

Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Internal Medicine, 183(6), 589. https://doi.org/10.1001/jamainternmed.2023.1838

About the author

Hippocrates Briefs Team