Internal Medicine Practice

RAG Enhances Local LLM Performance for Radiology Contrast Media Consultations

Article Impact Level: HIGH
Data Quality: STRONG
Summary of Npj Digital Medicine, 8(1), 395. https://doi.org/10.1038/s41746-025-01802-z
Dr. Akihiko Wada et al.

Points

  • Researchers addressed the challenge of making rapid, private clinical decisions by using retrieval-augmented generation to significantly improve locally deployed AI for radiology contrast media consultations.
  • The team tested their RAG-enhanced model against its baseline and three leading cloud-based AIs across 100 simulated consultations involving common iodinated contrast media clinical scenarios.
  • The enhanced model successfully eliminated dangerous clinical hallucinations, reducing them from 8% to zero, and delivered responses significantly faster, at 2.6 seconds, compared to cloud-based systems.
  • This system operates by retrieving information from a curated knowledge base of medical guidelines, ensuring accuracy while running efficiently on standard hospital computers without requiring costly hardware.
  • This breakthrough paves the way for deploying safer, expert-level AI solutions across various medical fields, advancing a new era of medicine that prioritizes both clinical excellence and patient privacy.

Summary

A recent study evaluated the efficacy of retrieval-augmented generation (RAG) in enhancing a locally deployed large language model (LLM) for radiology consultations involving contrast media. The investigation compared a RAG-enhanced Llama 3.2-11B model against its baseline version and three cloud-based models (GPT-4o mini, Gemini 2.0 Flash, and Claude 3.5 Haiku) across 100 synthetic clinical cases. A blinded radiologist ranked the model outputs for each case, while automated LLM-based judges assessed metrics including accuracy, safety, and latency. The study aimed to determine if RAG could improve a local model’s clinical utility while preserving the data privacy benefits of on-premise deployment.

The results demonstrated a significant improvement in the local model’s performance with the implementation of RAG. Critically, RAG eliminated clinical hallucinations, reducing their incidence from 8% in the baseline model to 0% (χ²₍Yates₎ = 6.38, p = 0.012). The RAG-enhanced model also achieved a significant improvement in mean rank of 1.3 compared to the baseline (Z = –4.82, p < 0.001). Furthermore, it maintained a substantial speed advantage, with a mean response time of 2.6 seconds, compared to the 4.9–7.3 second latency of the cloud-based models.

Although the RAG model showed marked improvements, the radiologist’s evaluation indicated that performance gaps persist when compared to leading cloud models, with GPT-4o mini receiving a higher overall ranking. In conclusion, RAG provides a clinically meaningful enhancement to on-premise LLMs, drastically improving safety and efficiency for time-sensitive consultations. This makes it a viable pathway for deploying advanced AI decision support in clinical settings without compromising patient data security or requiring specialized hardware.

Link to the article: https://www.nature.com/articles/s41746-025-01802-z


References

Wada, A., Tanaka, Y., Nishizawa, M., Yamamoto, A., Akashi, T., Hagiwara, A., Hayakawa, Y., Kikuta, J., Shimoji, K., Sano, K., Kamagata, K., Nakanishi, A., & Aoki, S. (2025). Retrieval-augmented generation elevates local LLM quality in radiology contrast media consultation. Npj Digital Medicine, 8(1), 395. https://doi.org/10.1038/s41746-025-01802-z

About the author

Hippocrates Briefs Team