Bengaluru, India and Online  /  December 04, 2022  -  December 08, 2022

International Society for Music Information Retrieval ISMIR 2022

The ISMIR conference is the world’s leading research forum on processing, searching, organising and accessing music-related data. The 23rd International Society for Music Information Retrieval Conference (ISMIR 2022) will take place in a hybrid format from Dec 4-8, 2022 and will be hosted in Bengaluru, India. 

Fraunhofer IDMT's Semantic Music Technologies group will be represented at this year's ISMIR with the following contributions on current research activities.

Multi-pitch Estimation meets Microphone Mismatch: Applicability of Domain Adaptation

Bittner, Franca; Gonzalez, Marcel; Richter, Maike L; Lukashevich, Hanna; Abeßer, Jakob

The performance of machine learning (ML) models is known to be affected by discrepancies between training (source) and real-world (target) data distributions. This problem is referred to as domain shift and is commonly approached using domain adaptation (DA) methods. As one relevant scenario, automatic piano transcription algorithms in music learning applications potentially suffer from domain shift since pianos are recorded in different acoustic conditions using various devices. Yet, most currently available datasets for piano transcription only cover ideal recording situations with high-quality microphones. Consequently, a transcription model trained on these datasets will face a mismatch between source and target data in real-world scenarios. To address this issue, we employ a recently proposed dataset which includes annotated piano recordings covering typical real-life recording settings for a piano learning application on mobile devices. We first quantify the influence of the domain shift on the performance of a deep learning-based piano multi-pitch estimation (MPE) algorithm. Then, we employ and evaluate four unsupervised DA methods to reduce domain shift. Our results show that the studied MPE model is surprisingly robust to domain shift in microphone mismatch scenarios and the DA methods do not notably improve the transcription performance.

Paper presentation: Paper Session 4, December 6, 2022, 9:00 CET

Audio Augmentations for Semi-Supervised Learning with FixMatch

Grollmisch, Sascha; Cano, Estefania; Abeßer, Jakob

FixMatch, a semi-supervised learning method proposed for image classification, includes unlabeled data instances into the training procedure by predicting labels for differently augmented versions of the unlabeled data. In our previous work, we adapted FixMatch to audio classification by applying image augmentations to spectral representations of the audio signal. While this approach matched the performance of the supervised baseline with only a fraction of the training data, the performance of audio-specific augmentation techniques, and their effect on the FixMatch approach was not evaluated. In this work, we replace all image-based augmentation techniques with audio-specific ones and keep the feature extraction unchanged. The audio-specific approach improved upon the supervised baseline which confirms the effectiveness of the FixMatch approach for semi-supervised learning even with a completely different set of augmentations. However, the image-based approach outperforms the audio-based approach on the three audio classification tasks evaluated.

Presentation: Late Breaking Demo (virtual), December 8, 2022, 13:00 CET