Nuremberg  /  December 04, 2023  -  December 07, 2023

15th IEEE International Workshop on Information Forensics and Security

From December 4 to 7, 2023, the 15th edition of the IEEE International Workshop on Information Forensics and Security (WIFS) will take place in Nuremberg, Germany. With contributions on synthetic speech detection and audio phylogeny, Fraunhofer IDMT will present current research activities in the field of media forensics.

»An Open Dataset of Synthetic Speech«

Artem Yaroshchuk, Christoforos Papastergiopoulos, Luca Cuccovillo, Patrick Aichroth, Konstantinos Votis, Dimitrios Tzovaras 

This paper introduces a multilingual, multispeaker dataset composed of synthetic and natural speech, designed to foster research and benchmarking in synthetic speech detection. The dataset encompasses 18,993 audio utterances synthesized from text, alongside with their corresponding natural equivalents, representing approximately 17 hours of synthetic audio data. The dataset features synthetic speech generated by 156 voices spanning three languages, namely, English, German, and Spanish, with a balanced gender representation. It targets state-of-the-art synthesis methods, and has been released with a license allowing seamless extension and redistribution by the research community.

The paper will be presented on December 5 at 16.00.

»Advancing Audio Phylogeny: A Neural Network Approach for Transformation Detection«

Milica Gerhardt, Luca Cuccovillo, Patrick Aichroth

In this study we propose a novel approach to audio phylogeny, i.e. the detection of relationships and transformations within a set of near-duplicate audio items, by leveraging a deep neural network for efficiency and extensibility. Unlike existing methods, our approach detects transformations between nodes in one step, and the transformation set can be expanded by retraining the neural network without excessive computational costs. We evaluated our method against the state of the art using a self-created and publicly released dataset, observing a superior performance in reconstructing phylogenetic trees and heightened transformation detection accuracy. Moreover, the ability to detect a wide range of transformations and to extend the transformation set make the approach suitable for various applications.

The paper will be presented on December 5 at 16.00.

»Audio Spectrogram Transformer for Synthetic Speech Detection via Speech Formant Analysis«

Luca Cuccovillo, Milica Gerhardt, Patrick Aichroth

In this paper, we address the challenge of synthetic speech detection, which has become increasingly important due to the latest advancements in text-to-speech and voice conversion technologies. We propose a novel multi-task neural network architecture, designed to be interpretable and specifically tailored for audio signals. The architecture includes a feature bottleneck, used to autoencode the input spectrogram, predict the fundamental frequency (f0) trajectory, and classify the speech as synthetic or natural. Hence, the synthesis detection can be considered a byproduct of attending to the energy distribution among vocal formants, providing a clear understanding of which characteristics of the input signal influence the final outcome. Our evaluation on the ASVspoof 2019 LA partition indicates better performance than the current state of the art, with an AUC score of 0.900.

The paper will be presented on December 7 at 9.00.