Home > Published Issues > 2026 > Volume 17, No. 1, 2026 >
JAIT 2026 Vol.17(1): 171-189
doi: 10.12720/jait.17.1.171-189

A Deep Knowledge-Infused Adaptive Hybrid Network for Speech-Based Depression and Severity Detection

Raminder Kaur Nagra and Vikram Kulkarni *
Department of Information Technology, Mukesh Patel School of Technology Management & Engineering, SVKM’s NMIMS (Deemed to be University), Mumbai, India
Email: raminderkaur.nagra@nmims.edu (R.K.N.); vikram.kulkarni@nmims.edu (V.K.)
*Corresponding author

Manuscript received April 30, 2025; revised August 15, 2025; accepted August 18, 2025; published January 20, 2026.

Abstract—Depressive disorders represent specific mental health conditions that result in negative impacts on a person’s interpersonal, social and psychological wellness. If depression is not precisely diagnosed, it can lead to persistent emotions of suicidal thoughts and suicide attempts. Therefore, early identification of depression is essential. Compared to other behavioral signals, speech is easily accessible, widely available, and a reliable indicator for depression screening. Depression detection by speech is also easy and comparatively inexpensive, since it requires substantially less bandwidth and computing capability. Depression severity is usually measured using machine learning and deep learning algorithms. Even while research in this field has improved significantly, there are still many obstacles to overcome before performing the depression diagnostic in practice. Therefore, in this research, a novel depression and its severity detection mechanism is implemented. Initially, the required speech signals are fetched using the benchmarked dataset; these are then pre-processed with the help of the filtering and spectral transformation models, that removes the unwanted background noise and patterns of original speech signals. In addition, the features are extracted from the pre-processed signals using the Spatio-Temporal Attention-based Convolutional Autoencoder (STA-CAE). The resulting features are then fed in to the Adaptive Hybrid Network (AHNet) that identifies the depression present and its level of severity. In this case, the established AHNet presents the Deep networks like Deep Support Vector Machine (DSVM) and Pyramid Dilated Temporal Convolutional Network (PDTCN) with the goal of detection. Revised Uniform Attribute-based Sculptor Optimization Algorithm (RUA-SOA) is used to optimize AHNet performance. Lastly, the proposed framework is examined and compared with the available models to ensure its effectiveness.
 
Keywords—depression and severity detection; speech signals; adaptive hybrid network; feature extraction; knowledge-infused deep learning; mental health informatics

Cite: Raminder Kaur Nagra and Vikram Kulkarni, "A Deep Knowledge-Infused Adaptive Hybrid Network for Speech-Based Depression and Severity Detection," Journal of Advances in Information Technology, Vol. 17, No. 1, pp. 171-189, 2026. doi: 10.12720/jait.17.1.171-189

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions