Home
Author Guide
Editor Guide
Reviewer Guide
Published Issues
Special Issue
Introduction
Special Issues List
Sections and Topics
Sections
Topics
Internet of Things (IoT) in Smart Systems and Applications
journal menu
Aims and Scope
Editorial Board
Indexing Service
Article Processing Charge
Open Access
Copyright and Licensing
Preservation and Repository Policy
Publication Ethics
Editorial Process
Contact Us
General Information
ISSN:
1798-2340 (Online)
Frequency:
Monthly
DOI:
10.12720/jait
Indexing:
ESCI (Web of Science)
,
Scopus
,
CNKI
,
etc
.
Acceptance Rate:
12%
APC:
1000 USD
Average Days to Accept:
87 days
Journal Metrics:
Impact Factor 2023: 0.9
4.2
2023
CiteScore
57th percentile
Powered by
Article Metrics in Dimensions
Editor-in-Chief
Prof. Kin C. Yow
University of Regina, Saskatchewan, Canada
I'm delighted to serve as the Editor-in-Chief of
Journal of Advances in Information Technology
.
JAIT
is intended to reflect new directions of research and report latest advances in information technology. I will do my best to increase the prestige of the journal.
What's New
2025-01-10
All 12 papers published in JAIT Vol. 15, No. 10 have been indexed by Scopus.
2024-12-23
JAIT Vol. 15, No. 12 has been published online!
2024-06-07
JAIT received the CiteScore 2023 with 4.2, ranked #169/394 in Category Computer Science: Information Systems, #174/395 in Category Computer Science: Computer Networks and Communications, #226/350 in Category Computer Science: Computer Science Applications
Home
>
Published Issues
>
2022
>
Volume 13, No. 5, October 2022
>
JAIT 2022 Vol.13(5): 398-412
doi: 10.12720/jait.13.5.398-412
Text to Speech Synthesis: A Systematic Review, Deep Learning Based Architecture and Future Research Direction
Fahima Khanam
1
, Farha Akhter Munmun
1
, Nadia Afrin Ritu
1
, Aloke Kumar Saha
2
, and Muhammad Firoz Mridha
1
1. Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
2. Department of Computer Science and Engineering, University of Asia Pacific, Dhaka, Bangladesh
Abstract
—Text to Speech (TTS) synthesis is a process of translating natural language text into speech. Pieces of recorded speech generate synthesized speech and a database is maintained for storing this synthesized speech. A speech synthesizer’s output is determined through its resemblance to the person utter and its capacity to be implied. In recent years between the two main subsections: machine learning and deep learning of Artificial Intelligence (AI), deep learning has achieved huge success in the domain of text to speech synthesis. In this literature, a taxonomy is introduced which represents some of the deep learning-based architectures and models popularly used in speech synthesis. Different datasets that are used in TTS have also been discussed. Further, for evaluating the quality of the synthesized speech some of the widely used evaluation matrices are described. Finally, the paper concludes with the challenges and future directions of the text-to-speech synthesis system.
Index Terms—
Text to Speech (TTS), deep learning, acoustic features, parametric synthesis, concatenative synthesis, text analysis
Cite: Fahima Khanam, Farha Akhter Munmun, Nadia Afrin Ritu, Aloke Kumar Saha, and Muhammad Firoz Mridha, "Text to Speech Synthesis: A Systematic Review, Deep Learning Based Architecture and Future Research Direction," Journal of Advances in Information Technology, Vol. 13, No. 5, pp. 398-412, October 2022.
Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License (
CC BY-NC-ND 4.0
), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.
1-JAIT-4343-Final-Bangladesh
PREVIOUS PAPER
First page
NEXT PAPER
Multi-view Deep CNN for Automated Target Recognition and Classification of Synthetic Aperture Radar Image