Diagnostic Accuracy of Host Gene Expression Combined with Large Language Models

Article Impact Level: HIGH
Data Quality: STRONG
Summary of  Nature Communications https://doi.org/10.1038/s41467-025-66218-5 
Dr. Hoang Van Phan  et al.

Points

Researchers combined the pulmonary transcriptomic biomarker FABP4 with GPT-4 analysis of medical records to diagnose lower respiratory tract infections in critically ill adults.
The integrated diagnostic model achieved an accuracy of 96 percent in an independent validation cohort which significantly outperformed the 72 percent accuracy of clinical teams.
Study data suggests that implementing this combined classifier at the time of admission could potentially reduce inappropriate antibiotic administration by more than 80 percent.
The artificial intelligence model prioritized radiology reports during analysis whereas human physicians tended to focus more heavily on clinical notes when making diagnostic decisions.
This method offers a rapid and effective strategy to distinguish between infectious and non-infectious causes of respiratory failure without requiring complex bioinformatics expertise.

Summary

This study developed a diagnostic method for lower respiratory tract infections (LRTI) in critically ill adults by integrating the pulmonary transcriptomic biomarker FABP4 with electronic medical record analysis via the Generative Pre-trained Transformer 4 (GPT-4) model. Diagnosing LRTI is difficult due to non-infectious respiratory failure mimics, often leading to antibiotic overuse. The researchers assessed a cohort including 98 pre-pandemic patients and 59 pandemic-era patients to validate the combined classifier against standard clinical assessment and individual diagnostic modalities.

The combined classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.93 ± 0.08 and an accuracy of 84% in the initial cohort. This significantly outperformed FABP4 expression alone (AUC 0.84 ± 0.11) and GPT-4 analysis alone (AUC 0.83 ± 0.07). In an independent validation cohort, the integrated model demonstrated superior performance with an AUC of 0.98 ± 0.04 and 96% diagnostic accuracy. By contrast, the admitting medical team’s diagnoses yielded an accuracy of only 72%. Qualitative analysis revealed that the AI model prioritized radiology reports, whereas physicians focused on clinical notes.

The study estimates that implementing this diagnostic model at admission could have reduced inappropriate antibiotic administration by over 80%. These findings suggest that pairing host gene expression biomarkers with large language model processing of clinical data offers a rapid, highly accurate mechanism for distinguishing infectious from non-infectious respiratory failure in the intensive care unit.

Link to the article: https://www.nature.com/articles/s41467-025-66218-5

References

Phan, H. V., Spottiswoode, N., Lydon, E. C., Chu, V. T., Cuesta, A., Kazberouk, A. D., Richmond, N. L., Deosthale, P., Calfee, C. S., & Langelier, C. R. (2025). Integrating a host biomarker with a large language model for diagnosis of lower respiratory tract infection. Nature Communications, 16(1), 10882. https://doi.org/10.1038/s41467-025-66218-5