Home > Published Issues > 2025 > Volume 16, No. 5, 2025 >
JAIT 2025 Vol.16(5): 725-741
doi: 10.12720/jait.16.5.725-741

Deep Learning-Based Semantic Diagnostic Framework for Big EHR Corpora

Sarah Shafqat 1,*, Qaisar Javaid 2, Majed Al-Saeed 3, and Hafiz Farooq Ahmad 3,*
1. Ambient Cloud, Islamabad, Pakistan
2. Faculty of Computing and IT, International Islamic University, Islamabad, Pakistan
3. Computer Science Department, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, Al-Ahsa, Saudi Arabia
Email: ceo@DreamzSoft.org (S.S.); qaisar@iiu.edu.pk (Q.J.); alsaeed@kfu.edu.sa (M.A.S.); hfahmad@kfu.edu.sa (H.F.A.)
*Corresponding author

Manuscript received January 6, 2025; revised January 13, 2025; accepted February 26, 2025; published May 22, 2025.

Abstract—Diagnoses if mishandled or left untreated may cause several other chronic illnesses as in the case of Diabetes Mellitus (DM). Researchers have explored endocrine domain for diagnoses of DM and its comorbidities using big data cloud analytics. It is seen that if DM patient condition gets worse it forms diseases like; breast cancer, arthritis, body pains, dementia, foot ulcers resulting in amputation, etc. Researchers initiated the design and development of unified medical corpora with real-time big Electronic Health Records (EHR) dataset of endocrine patients. The corpora are standardized using standard international medical nomenclature. It used International Classification Disease Diagnostic (ICD-10-CM) codes to label target diagnoses and is in compliance with Health Level Seven Fast Healthcare Interoperability Resources (HL7 FHIR) standard. Corpuses in the corpora with varying sizes and parameters is experimented on; Colab, Orange and RapidMiner open-source cloud frameworks. The performance of different models was observed using deep learning heuristics. The three models were proposed; i) Deep Neural Nets (DNN) Bi-directional Long Short Term Memory (Bi-LSTM) Sequential Model, ii) Louvain Mani-Hierarchical Fold Learning (LMHFL) and its optimized model; Fast-LMHFL, and iii) Custom Deep Multinomial Distribution Learning (DMDL) model enabled for semantics and temporal variables. DNN Bi-LSTM shows the efficacy of TensorFlow and deep neural networks. LMHFL gave diagnostic visual inferences with maximum accuracy of 0.727 and correlations up to 0.952 between patients’ phenotypes with Laplace. DMDL was 100% accurate.
 
Keywords—big data healthcare analytics, semantics, Named Entity Records (NER), tabular text mining, International Classification of Diseases (ICD-10-CM), endocrine comorbidity diseases, diagnostics

Cite: Sarah Shafqat, Qaisar Javaid, Majed Al-Saeed, and Hafiz Farooq Ahmad, "Deep Learning-Based Semantic Diagnostic Framework for Big EHR Corpora," Journal of Advances in Information Technology, Vol. 16, No. 5, pp. 725-741, 2025. doi: 10.12720/jait.16.5.725-741

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions