Technical Metrics vs. Clinical Truth: A Ten-Year Review of Silent AI Trials

Article Impact Level: HIGH
Data Quality: STRONG
Summary of  Nature Health https://doi.org/10.1038/s44360-025-00048-z 
Dr. Lana Tikhomirov  et al.

Points

A comprehensive scoping review of medical literature from the last decade identified significant inconsistencies in how silent trials are conducted for testing artificial intelligence in clinical healthcare settings.
Researchers analyzed seventy five specific studies and found that most focus on technical measurements like area under the curve while failing to detail how models perform during actual clinical interactions.
The study revealed that a lack of formal international guidelines for these non interventional trials creates a risk of exposing both patients and clinicians to potentially harmful automated medical advice.
Project CANAIRI is currently working to develop standardized guidance to ensure that health settings are properly equipped to evaluate AI tools in a beneficial and highly reproducible manner.
Establishing silent trials as a mandatory phase of adoption will help identify unpredictable model behaviors and ensure that technology works effectively across different hospital demographics and local system infrastructures.

Summary

This research evaluated the current landscape of “silent trials”—the prospective, non-interventional testing of artificial intelligence (AI) models within clinical settings where outputs do not influence patient care. A global scoping review of literature from 2015 to 2025 identified 891 relevant articles, with 75 meeting the rigorous inclusion criteria for final analysis. The study aimed to characterize existing practices and identify gaps in the translational phase between in silico development and formal clinical implementation.

The analysis revealed significant heterogeneity in the terminology, rationale, and reported outcomes of silent evaluations. While a majority of the 75 included papers focused on technical AI metrics, such as the area under the curve (AUC), there was a notable lack of standardization regarding clinical validation. Specifically, far fewer studies reported the verification of AI outputs against in situ clinical ground truths. Furthermore, the review identified a critical deficit in the assessment of sociotechnical components, including stakeholder engagement and human-computer interaction, which are essential for successful real-world integration.

The findings highlight an urgent need for formal international guidelines to govern the conduct of silent trials in healthcare. Researchers noted that AI models often fail when transitioned to local settings due to unpredictable site-specific variables. By establishing comprehensive protocols through initiatives like Project CANAIRI, the medical community can ensure that silent trials serve as a mandatory, low-risk validation phase. Addressing these evaluative gaps is essential to mitigate the risk of providing harmful clinical advice and to ensure that AI tools are both beneficial and safe for bedside application.

Link to the article: https://www.nature.com/articles/s44360-025-00048-z

References

Tikhomirov, L., Semmler, C., Prizant, N., Bhasin, S., Kenyon, G., van der Vegt, A., Erdman, L., Kurian, N. C., Thompson, H., Palmer, L. J., Mohamud, A., Gichoya, J. W., Soremekun, S., Sendak, M. P., Anderson, J. A., Pfohl, S. R., Stedman, I., Ehrmann, D., Verspoor, K., … McCradden, M. D. (2026). A scoping review of silent trials for medical artificial intelligence. Nature Health, 1–23. https://doi.org/10.1038/s44360-025-00048-z

Technical Metrics vs. Clinical Truth: A Ten-Year Review of Silent AI Trials

Points

Summary

About the author

Hippocrates Briefs Team

Points

Summary

You may also like

About the author

Hippocrates Briefs Team