SpectroFusionNet a CNN approach utilizing spectrogram fusion for electric guitar play recognition.

Journal: Scientific Reports
Published:
Abstract

Music, a universal language and cultural cornerstone, continues to shape and enhance human expression and connection across diverse societies. This study introduces SpectroFusionNet, a comprehensive deep learning framework for the automated recognition of electric guitar playing techniques. The proposed approach first extracts various spectrograms, including Mel-Frequency Cepstral Coefficients (MFCC), Continuous Wavelet Transform (CWT), and Gammatone spectrograms, to capture the intricate audio features. These spectrograms are then individually processed using lightweight models (MobileNetV2, InceptionV3, ResNet50) to extract discriminative features of different guitar sounds, with ResNet50 yielding better performance. To further enhance the classification performance across nine distinct guitar sound classes, two types of fusion strategies are adopted to provide rich feature representation: One is early fusion where the spectrograms are combined before the feature extraction and the other one is late fusion approach where the independent features from spectrograms are concatenated via three approaches: weighted averaging, max-voting and simple concatenation. Then, the fused features are subsequently fed into nine machine learning classifiers, including Support Vector Machine (SVM), Multilayer Perceptron (MLP), Logistic Regression, Random Forest etc., for final classification. Experimental results demonstrate that MFCC-Gammatone late fusion provided the best classification performance, achieving 99.12% accuracy, 100% precision, and 100% recall across 9 distinct guitar sound classes. To further assess the SpectroFusionNet's generalization ability, real-time audio dataset is evaluated, demonstrating an accuracy of 70.9%, indicating its applicability in real world scenarios.

Authors
Ganesh Chellamani, Aishwarya N, Chandhana C, Kanwaljeet Kaur, Rakesh Thoppaen Babu