Optimized Deep Neural Networks Audio Tagging Framework for Virtual Business Assistant

General Information

ISSN: 1798-2340 (Online)
Frequency: Monthly
DOI: 10.12720/jait
Indexing: ESCI (Web of Science), Scopus, DOAJ, CNKI, EBSCO, etc.
APC: 1000 USD
Acceptance Rate: 28%
Average Days to Accept: 105 days
Managing Editor: Ms. Mia Hu
E-mail: editor@jait.us
Journal Metrics:
Impact Factor 2024: 1.5-Q3; CiteScore 2024: 4.8-Q3

4.8

2024CiteScore

64th percentile

Powered by

Editor-in-Chief

Prof. Kin C. Yow

University of Regina, Saskatchewan, Canada

I'm delighted to serve as the Editor-in-Chief of Journal of Advances in Information Technology. JAIT is intended to reflect new directions of research and report latest advances in information technology. I will do my best to increase the prestige of the journal.

What's New

2026-03-11

All papers published in JAIT Vol.17, No. 2 have been indexed by Scopus.

2026-02-23

JAIT Vol. 17, No. 2 has been published online!

2025-10-21

Exciting news! JAIT has been accepted for inclusion in the Directory of Open Access Journals (DOAJ)!

Home > Published Issues > 2023 > Volume 14, No. 3, 2023 >

JAIT 2023 Vol.14(3): 550-558
doi: 10.12720/jait.14.3.550-558

Fatma Sh. El-metwally 1, Ali I. Eldesouky 1, Nahla B. Abdel-Hamid 1, and Sally M. Elghamrawy 2,3,*

1. Department of Computer Engineering and Control Systems, Faculty of Engineering, Mansoura University, Mansoura, Egypt
2. Department of Computer Engineering, MISR Higher Institute for Engineering and Technology, Mansoura, Egypt
3. Scientific Research Group in Egypt (SRGE), Egypt
*Correspondence: sally@mans.edu.eg, sally_elghamrawy@ieee.org (S.M.E.)

Manuscript received July 1, 2022; revised August 12, 2022; accepted November 11, 2022; published June 16, 2023.

Abstract—A virtual assistant has a huge impact on business and an organizations development. It can be used to manage customer relations and deal with received queries, automatically reply to e-mails and phone calls.Audio signal processing has become increasingly popular since the development of virtual assistants. Deep learning and audio signal processing advancements have dramatically enhanced audio tagging. Audio Tagging (AT) is a challenge that requires eliciting descriptive labels from audio clips. This study proposes an Optimized Deep Neural Networks Audio Tagging Framework for Virtual Business Assistant to categorize and analyze audio tagging. Each input signal is used to extract the various audio tagging features. The extracted features are input into a neural network to carry out a multi-label classification for the predicted tags. Optimization techniques are used to improve the quality of the model fit for neural networks. To test the efficiency of the framework, four comparison experiments have been conducted between it and some of the others. From these results, it was concluded that this framework is better than the others in terms of efficiency. When the neural network was trained, Mel-Frequency Cepstral Coefficient (MFCC) features with Adamax achieved the best results with 93% accuracy and a 0.17% loss. When evaluating the performance of the model for seven labels, it achieved an average of precision 0.952, recall 0.952, F-score 0.951, accuracy 0.983, and an equal error rate of 0.015 in the evaluation set compared to the provided Detection and Classification of Acoustic Scenes and Events (DSCASE) baseline where he achieved and accuracy of 72.5% and a 0.209 equal error rate.

Keywords—audio tagging, Deep Neural Networks (DNNs), optimizations, Detection and Classification of Acoustic Scenes and Events (DCASE)

Cite: Fatma Sh. El-metwally, Ali I. Eldesouky, Nahla B. Abdel-Hamid, and Sally M. Elghamrawy, "Optimized Deep Neural Networks Audio Tagging Framework for Virtual Business Assistant," Journal of Advances in Information Technology, Vol. 14, No. 3, pp. 550-558, 2023.

Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.

PREVIOUS PAPER

Utilizing Word Index Approach with LSTM Architecture for Extracting Adverse Drug Reaction from Medical Reviews

NEXT PAPER

Crowdsensing: Assessment of Cognitive Fitness Using Machine Learning

Home

Author Guide

Editor Guide

Reviewer Guide

Published Issues

Special Issue

Sections and Topics

journal menu