Home > Published Issues > 2024 > Volume 15, No. 1, 2024 >
JAIT 2024 Vol.15(1): 40-48
doi: 10.12720/jait.15.1.40-48

Ensemble of Multimodal Deep Learning Models for Violin Bowing Techniques Classification

Zain Muhammed 1, Nagamanoj Karunakaran 2,*, Pranamya P. Bhat 1, and Arti Arya 1
1. Department of Computer Science Engineering, PES University, Bengaluru, India
2. Control Design Automation, MathWorks, Bengaluru, India
Email: zainmuhammed66@gmail.com (Z.M.); nkarunak@mathworks.com (N.K.);
pranamyabhat27@gmail.com (P.P.B.); artiarya@pes.edu (A.A.)
*Corresponding author

Manuscript received May 6, 2023; revised June 5, 2023; accepted July 2, 2023; published January 9, 2024.

Abstract—Bowing gesture while playing violin refers to the motion of the violinist’s arm. Violinists use different types of bow strokes to express musical phrases, played by the movement of the right arm holding the fiddle bow. Although the sound produced by each bow stroke is distinct, it can be difficult for new fiddlers to distinguish and recognize these bowing techniques. So, this paper presents a novel approach of an ensemble of multimodal deep learning models consisting of one Convolution Neural Network (CNN) and two Long Short-Term Memory (LSTM) models to classify into one of the five bowing classes: detaché, legato, martelé, spiccato and staccato. The dataset used consists of audio samples performed by 8 violinists along with the motion of their forearms measured using a Myo sensor device, to acquire 8-channels of Electromyogram (EMG) data and 13-channels of Inertial Measurement Unit (IMU) data. The audio features are extracted from audio excerpts and time domain features are extracted from EMG and IMU motion signals. These features are passed into an ensemble of deep learning models to make the final prediction using weighted voting. The proposed ensemble classifier was able to deliver optimal results with an overall accuracy of 99.5%, which is better than the previous studies that took only either audio or motion data into consideration.
Keywords—violin bowing technique, audio features, motion features, Electromyogram (EMG), Inertial Measurement Unit (IMU), Essentia, Convolution Neural Network (CNN), Long Short-Term Memory (LSTM), deep learning model

Cite: Zain Muhammed, Nagamanoj Karunakaran, Pranamya P. Bhat, and Arti Arya, "Ensemble of Multimodal Deep Learning Models for Violin Bowing Techniques Classification," Journal of Advances in Information Technology, Vol. 15, No. 1, pp. 40-48, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.