Home
Author Guide
Editor Guide
Reviewer Guide
Published Issues
Special Issue
Introduction
Special Issues List
Sections and Topics
Sections
Topics
Internet of Things (IoT) in Smart Systems and Applications
journal menu
Aims and Scope
Editorial Board
Indexing Service
Article Processing Charge
Open Access
Copyright and Licensing
Preservation and Repository Policy
Publication Ethics
Editorial Process
Contact Us
General Information
ISSN:
1798-2340 (Online)
Frequency:
Monthly
DOI:
10.12720/jait
Indexing:
ESCI (Web of Science)
,
Scopus
,
CNKI
,
etc
.
Acceptance Rate:
12%
APC:
1000 USD
Average Days to Accept:
87 days
Journal Metrics:
Impact Factor 2023: 0.9
4.2
2023
CiteScore
57th percentile
Powered by
Article Metrics in Dimensions
Editor-in-Chief
Prof. Kin C. Yow
University of Regina, Saskatchewan, Canada
I'm delighted to serve as the Editor-in-Chief of
Journal of Advances in Information Technology
.
JAIT
is intended to reflect new directions of research and report latest advances in information technology. I will do my best to increase the prestige of the journal.
What's New
2024-09-25
Vol. 15, No. 9 has been published online!
2024-08-28
Vol. 15, No. 8 has been published online!
2024-07-29
Vol. 15, No. 7 has been published online!
Home
>
Published Issues
>
2022
>
Volume 13, No. 4, August 2022
>
JAIT 2022 Vol.13(4): 393-397
doi: 10.12720/jait.13.4.393-397
Deep Learning System Based on the Separation of Audio Sources to Obtain the Transcription of a Conversation
Nahum Flores, Daniel Angeles, and Sebastian Tuesta
Faculty of System Engineering and Informatic, Universidad Nacional Mayor de San Marcos, Lima, Peru
Abstract
—Podcasting has lately been in the spotlight for being the fastest-growing format, especially during the pandemic. This growth has highlighted the need for making podcasts accessible to diverse audiences, especially those having auditory disabilities. The current transcription methods have been unsatisfactory; therefore, we present an alternative method to transcribe audio files into text by segmenting audio sources. The applied methodology considers the construction of a public audio dataset having a duration of more than 15h. The training model was based on three scenarios in which the duration of the training data was varied to determine the best performance, which was 10.77 in terms of the scale-invariant signal-to-noise ratio. We have simplified podcasting accessibility by making available the source code of each component that we developed.
Index Terms
—public dataset, deep learning, audio source separation, speech to text
Cite: Nahum Flores, Daniel Angeles, and Sebastian Tuesta, "Deep Learning System Based on the Separation of Audio Sources to Obtain the Transcription of a Conversation," Journal of Advances in Information Technology, Vol. 13, No. 4, pp. 393-397, August 2022.
Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License (
CC BY-NC-ND 4.0
), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.
13-S1-004-Peru
PREVIOUS PAPER
Novel Shared Input Based LSTM for Semantic Similarity Prediction
NEXT PAPER
Last page