Home > Published Issues > 2025 > Volume 16, No. 11, 2025 >
JAIT 2025 Vol.16(11): 1604-1623
doi: 10.12720/jait.16.11.1604-1623

Towards Accurate SDG Research Categorization: A Hybrid Deep Learning Approach Using Scopus Metadata

Jalal Sadoon Hameed Al-Bayati 1,*, Furat Nidhal Tawfeeq 1, and Mohammed Al-Shammaa 2
1. Website Division, University of Baghdad, Baghdad, Iraq
2. Department of Computer Engineering, College of Engineering, University of Baghdad, Baghdad, Iraq
Email: Jalal.hameed@uobaghdad.edu.iq (J.S.H.A.B.); Furat@bccru.uobaghdad.edu.iq (F.N.T.); M.alshammaa@coeng.uobaghdad.edu.iq (M.A.S.)
*Corresponding author

Manuscript received May 9, 2025; revised July 3, 2025; accepted July 25, 2025; published November 21, 2025.

Abstract—The complexity and variety of language included in policy and academic documents make the automatic classification of research papers based on the United Nations Sustainable Development Goals (SDGs) somewhat difficult. Using both pre-trained and contextual word embeddings to increase semantic understanding, this study presents a complete deep learning pipeline combining Bidirectional Long Short-Term Memory (BiLSTM) and Convolutional Neural Network (CNN) architectures which aims primarily to improve the comprehensibility and accuracy of SDG text classification, thereby enabling more effective policy monitoring and research evaluation. Successful document representation via Global Vector (GloVe), Bidirectional Encoder Representations from Transformers (BERT), and FastText embeddings follows our approach, which comprises exhaustive preprocessing operations including stemming, stopword deletion, and ways to address class imbalance. Training and evaluation of the hybrid BiLSTM-CNN model on several benchmark datasets, including SDG-labeled corpora and relevant external datasets like GoEmotion and Ohsumed, help provide a complete assessment of the model’s generalizability. Moreover, this study utilizes zero-shot prompt-based categorization using GPT-3.5/4 and Flan-T5, thereby providing a comprehensive benchmark against current approaches and doing comparative tests using leading models such as Robustly Optimized BERT Pretraining Approach (RoBERTa) and Decoding-enhanced BERT with Disentangled Attention (DeBERTa). Experimental results show that the proposed hybrid model achieves competitive performance due to contextual embeddings, which greatly improve classification accuracy. The study explains model decision processes and improves openness using interpretability techniques, including SHapley Additive exPlanations (SHAP) analysis and attention visualization. These results emphasize the need to incorporate rapid engineering techniques alongside deep learning architectures for effective and interpretable SDG text categorization. With possible effects on more general uses in policy analysis and scientific literature mining, this work offers a scalable and transparent solution for automating the evaluation of SDG research.
 
Keywords—text classification, Sustainable Development Goals (SDGs), deep learning, hybrid bidirectional Long Short-Term Memory-Convolutional Neural Network (LSTM-CNN), Global Vector (GloVe) embeddings

Cite: Jalal Sadoon Hameed Al-Bayati, Furat Nidhal Tawfeeq, and Mohammed Al-Shammaa, "Towards Accurate SDG Research Categorization: A Hybrid Deep Learning Approach Using Scopus Metadata," Journal of Advances in Information Technology, Vol. 16, No. 11, pp. 1604-1623, 2025. doi: 10.12720/jait.16.11.1604-1623

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions