AI-based speech synthesis will soon be able to generate speech of such good quality that it will be indistinguishable from natural speech - a form of so-called “deepfakes”. In the near future, it will be possible to put arbitrary messages into the mouth of any person. Effective synthetic speech detection procedures are therefore of great importance to ensure the prevention and prosecution of fraud, phishing and disinformation in the future as well.
SpeechTrust+ aims to explore a new generation of detectors that will reliably and sustainably detect AI-based speech synthesis and voice distortion. To achieve this, we will not only apply the same AI methods that have led to the tremendous improvements in speech synthesis, but also use methods from the field of audio forensics. This combination is meant to create a toolbox for the robust detection of both speech synthesis and voice distortion.