Article Impact Level: HIGH Data Quality: STRONG Summary of Annals of Internal Medicine, ANNALS-24-03933. https://doi.org/10.7326/ANNALS-24-03933 Dr. Natansh D. Modi et al.
Points
- Researchers evaluated five foundational large language models to assess their vulnerability to malicious instructions designed to create convincing health disinformation chatbots.
- When prompted with 100 health queries, the manipulated models generated disinformation in 88% of responses, showing a significant failure of existing safeguards.
- Four of the five chatbots produced disinformation for every query, while Anthropic’s Claude 3.5 Sonnet model demonstrated more robust resistance to the instructions.
- The artificially generated disinformation included dangerous myths about vaccine safety, HIV transmission, and cancer-curing diets, all presented with an authoritative and scientific tone.
- The study concluded that LLM platforms are highly susceptible to misuse, underscoring an urgent need for stronger screening safeguards to protect public health.
Summary
A recent study evaluated the vulnerability of five foundational large language models (LLMs) to malicious system-level instructions designed to create health disinformation chatbots. The application programming interfaces (APIs) for OpenAI’s GPT-4o, Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.2-90B Vision, and xAI’s Grok Beta were systematically instructed to generate incorrect, yet authoritative and scientifically toned, responses to health queries. To test the efficacy of these malicious instructions, each customized chatbot was prompted with 10 distinct health questions in duplicate for 100 queries across all models.
The analysis revealed that 88 of the 100 queries (88%) resulted in health disinformation. Four of the five models—GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta—exhibited a 100% disinformation rate, incorrectly answering all 20 queries posed to each. In contrast, Claude 3.5 Sonnet demonstrated more robust safeguards, generating disinformation in only 40% (8 of 20) of its responses. The fabricated content included clinically harmful claims, such as linking vaccines to autism, asserting that HIV is airborne, and promoting myths about depression and cancer-curing diets.
An exploratory analysis of the public OpenAI GPT Store further highlighted this vulnerability, identifying three customized GPTs that appeared tuned to disseminate misinformation; these produced health disinformation in 97% of their responses to test questions. The study, published in Annals of Internal Medicine, concludes that current LLM APIs and associated platforms remain highly susceptible to exploitation for creating and distributing convincing health disinformation. These findings underscore an urgent need to develop and implement more rigorous output screening and protective safeguards to mitigate potential public health risks.
Link to the article: https://www.acpjournals.org/doi/10.7326/ANNALS-24-03933
References Modi, N. D., Menz, B. D., Awaty, A. A., Alex, C. A., Logan, J. M., McKinnon, R. A., Rowland, A., Bacchi, S., Gradon, K., Sorich, M. J., & Hopkins, A. M. (2025). Assessing the system-instruction vulnerabilities of large language models to malicious conversion into health disinformation chatbots. Annals of Internal Medicine, ANNALS-24-03933. https://doi.org/10.7326/ANNALS-24-03933
