Home > Published Issues > 2026 > Volume 17, No. 5, 2026 >
JAIT 2026 Vol.17(5): 1007-1014
doi: 10.12720/jait.17.5.1007-1014

Integrating Whole Blood Gene Expression and Clinical Metadata into a Machine Learning Pipeline for Predictive Coronary Artery Disease Diagnosis

Bilgin Demir 1, Djansel Bukovec 2, and Zhilbert Tafa 3,4,*
1. Computer Engineering, Faculty of Engineering, International Balkan University, Skopje, North Macedonia
2. Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
3 Department of Computer Science and Engineering, University for Business and Technology, Prishtina, Kosovo
4. Faculty of Computer Engineering, International Balkan University, Skopje, North Macedonia
Email: bilgin.demir@ibu.edu.mk (B.D.); dzhansel.bukovec@students.finki.ukim.mk (D.B.); tafaul@t-com.me (Z.T.)

*Corresponding author

Manuscript received December 20, 2025; revised January 24, 2026; accepted March 2, 2026; published May 25, 2026.

Abstract—Timely and accurate identification of signs of Coronary Artery Disease (CAD) remains a central issue in modern preventive cardiology. Traditional diagnostic methods overlook the molecular alterations that drive disease occurrence and progression. This study proposes a multimodal machine learning framework integrating whole-blood gene expression profiles with clinical variables to support molecularly informed CAD diagnostics. We employed a Machine Learning framework which prioritizes robustness and interpretability over maximal predictive performance. Models based on the combined microarray and clinical data achieved the highest internal discrimination (AUC ≈ 0.76). The confounding analyses identified age as the dominant predictor, whereas gene expression features contributed an independent and biologically meaningful signal. The final selected gene set was enriched in immune and inflammatory pathways relevant to CAD pathophysiology. External validation revealed a decrease in performance, consistent with expected cross-platform domain shifts. The study underscores both the potential and the challenges of transcriptomic prediction of CAD, highlighting the importance of rigorous validation and data interpretation in high-dimensional biomedical modeling.
 
Keywords—coronary artery disease, functional enrichment, gene expression, machine learning, logistic regression, statistical analysis

Cite: Bilgin Demir, Djansel Bukovec, and Zhilbert Tafa, "Integrating Whole Blood Gene Expression and Clinical Metadata into a Machine Learning Pipeline for Predictive Coronary Artery Disease Diagnosis," Journal of Advances in Information Technology, Vol. 17, No. 5, pp. 1007-1014, 2026. doi: 10.12720/jait.17.5.1007-1014

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions