Home > Published Issues > 2024 > Volume 15, No. 2, 2024 >
JAIT 2024 Vol.15(2): 306-321
doi: 10.12720/jait.15.2.306-321

Assamese Dialect Identification Using Static and Dynamic Features from Vowel

Hem Chandra Das 1,2,* and Utpal Bhattacharjee 2
1. Department of Computer Science and Technology, Bodoland University, Assam, India
2. Department of Computer Science and Engineering, Rajiv Gandhi University, Arunachal Pradesh, India
Email: hemchandradas78@gmail.com (H.C.D.); utpal.bhattacharjee@rgu.ac.in (U.B.)
*Corresponding author

Manuscript received June 6, 2023; revised July 21, 2023; accepted September 27, 2023; published February 26, 2024.

Abstract—This paper introduces a novel method for identifying Assamese dialects by analyzing the acoustic and prosodic aspects of vowel sounds in speech signals. The distinctive characteristics of these dialects are captured through the use of acoustic parameters such as formants (F1, F2, and F3), as well as prosodic features like energy, fundamental frequency (F0), and duration. To evaluate this approach, a comprehensive vowel speech corpus is collected from native Assamese speakers representing four different dialectal regions. Frame-level statistical features are extracted from vowel sounds, while temporal dynamic features are obtained from steady-state vowel segments. The data collection process involves using a phonetically rich script to record both read and spontaneous speech interactions from speakers of the four dialects. Various classification methods, including three decision tree-based classifiers, i.e., Random Forest (RF), Extreme Random Forest (ERF), and Extreme Gradient Boosting (XGB), are applied to distinguish the four dialects. The performance of each feature, whether static or dynamic, is individually evaluated. The study reveals that the identification of Assamese dialects is influenced by factors such as speech length, intensity, pitch, and formant frequencies. To assess the significance of these features in distinguishing dialects and to measure their combined impact on the identification system, single-factor Analysis of Variance (ANOVA) tests are conducted. Notably, when static features are combined with the Extreme Random Forest (ERF) ensemble model, the overall accuracy of dialect identification reaches 77%. This research demonstrates the efficacy of using acoustic and prosodic features to accurately classify Assamese dialects, shedding light on the subtle variations within them. In summary, this paper provides a robust framework for Assamese dialect identification and contributes to our understanding of dialect discrimination, paving the way for more advanced dialect identification systems.
Keywords—assamese dialect identification, formant frequencies, prosodic features, statistical features, dynamic features

Cite: Hem Chandra Das and Utpal Bhattacharjee, "Assamese Dialect Identification Using Static and Dynamic Features from Vowel," Journal of Advances in Information Technology, Vol. 15, No. 2, pp. 306-321, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.