Home > Published Issues > 2025 > Volume 16, No. 10, 2025 >
JAIT 2025 Vol.16(10): 1459-1469
doi: 10.12720/jait.16.10.1459-1469

Exploring Multimodal Deep Learning: Comparing Pre-trained and Custom Models for COVID-19 Classification

Kazeem Oyebode 1,2, Ebenezer Esenogho 2,*, and Modisane Cameron 2
1. Department of Computing, School of Science and Technology, Pan-Atlantic University, Lagos, Nigeria
2. Centre for Artificial Intelligence and Multidisciplinary Innovation Studies, Department of Auditing, College of Accounting Science, University of South Africa, Pretoria, South Africa
Email: kazeemkz@gmail.com (K.O.); esenoe@unisa.ac.za (E.E.); modistc@unisa.ac.za (M.C.)
*Corresponding author

Manuscript received December 30, 2024; revised January 7, 2025; accepted February 17, 2025; published October 24, 2025.

Abstract—COVID-19, a respiratory illness that mostly attacks the human lungs, emerged in 2019 and quickly became a global health crisis. Its fast transmission has necessitated the creation of effective tools that could aid in its classification. In this paper, we present an artificial intelligence multimodal deep learning model that leverages X-ray, Computed Tomography (CT) scan, and cough signals to classify COVID-19 accurately. The paper’s objective is to meticulously compare the effectiveness of non-pre-trained and pre-trained versions of VGG-19, MobileNetV2, and ResNet across various multimodal and some unimodal models using cough sound, X-ray, and CT scan datasets. This is important because it provides a pointer as to which combinations of datasets could improve COVID-19 prediction. Findings show that while the pre-trained unimodal systems for cough and X-ray outperform their non-pre-trained counterparts, the non-pre-trained CT scan model performs exceptionally well. This suggests that features learned from the VGG-19 model fail to generalize effectively. Remarkably, the non-pre-trained multimodal model accomplishes an F1-Score of 0.9804, slightly outperforming its pre-trained counterpart at 0.98. While this research advances our understanding of transfer learning, it also emphasizes the prospects of determining, from a range of options, which of the considered datasets (individual or combination) could give an acceptable level of COVID-19 classification in a resource-constrained scenario.
 
Keywords—machine learning, audio signal processing, deep learning, image classification, multimodal systems, transfer learning, unimodal systems

Cite: Kazeem Oyebode, Ebenezer Esenogho, and Modisane Cameron, "Exploring Multimodal Deep Learning: Comparing Pre-trained and Custom Models for COVID-19 Classification," Journal of Advances in Information Technology, Vol. 16, No. 10, pp. 1459-1469, 2025. doi: 10.12720/jait.16.10.1459-1469

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions