Article Impact Level: HIGH Data Quality: STRONG Summary of Npj Digital Medicine https://doi.org/10.1038/s41746-026-02337-7 Dr. Salim Yakdan et al.
Points
- Researchers developed AI models to screen for cervical spondylotic myelopathy by analyzing electronic health records of over two million people to identify early patterns of spinal cord dysfunction.
- The study demonstrated that specialized machine learning systems can predict incident diagnoses up to thirty months in advance which allows for earlier clinical intervention and improved patient outcomes.
- While large scale foundation models performed well on internal data they lacked the generalizability of smaller models that were specifically tailored using expert clinical insight and domain knowledge.
- External validation across different healthcare systems confirmed that domain informed models remain more robust and trustworthy for identifying high risk patients than purely data driven out of the box systems.
- This innovative screening approach provides a significant lead time to treat a progressive condition that is often recognized only after irreversible neurological damage has already occurred in seniors.
Summary
This research evaluated the efficacy of various machine learning architectures in predicting incident cervical spondylotic myelopathy (CSM) using electronic health record (EHR) data. Given that CSM is the primary cause of spinal cord dysfunction in older adults and often remains undiagnosed for years, investigators sought to identify patients at risk during a clinically actionable window. The study utilized a massive dataset of 2 million patients from the Merative MarketScan database and an institutional EHR to train and validate seven distinct models, ranging from domain-informed, clinically guided architectures to large-scale pretrained foundation models.
The findings demonstrated that AI-based screening could successfully predict CSM up to 30 months before clinical diagnosis. While large foundation models achieved superior performance during internal validation on heterogeneous data, they faced significant challenges regarding generalizability across different healthcare systems. In contrast, smaller, clinically tailored models—which prioritize domain-specific variables and clinical insight—exhibited more consistent and robust performance during external validation. Interestingly, mid-scale transformer models like CoreBEHRT and CEHRBERT underperformed across all evaluated time-horizons relative to both the foundation and simplified clinical models.
These results suggest that embedding clinical knowledge into AI solutions is essential for developing trustworthy diagnostic tools in complex neurology. By flagging high-risk medical histories early, these models provide a critical 30-month lead time for surgical or therapeutic intervention, potentially preventing permanent spinal cord damage. The study highlights that while data-driven foundation models offer rich representations of patient history, domain-informed architectures remain more reliable for broad clinical implementation across diverse institutional settings. This paradigm shift emphasizes the continued necessity of surgeon-scientist input in the evolution of digital medicine
Link to the article: https://www.nature.com/articles/s41746-026-02337-7
References
Yakdan, S., Warner, B., Ghogawala, Z., Ray, W. Z., Bydon, M., Steinmetz, M. P., Griffey, R. T., Foraker, R., Wilcox, A., Lu, C., & Greenberg, J. K. (2026). Clinically-guided models or foundation models? Predicting cervical spondylotic myelopathy from electronic health records. Npj Digital Medicine, 9(1), 153. https://doi.org/10.1038/s41746-026-02337-7
