AI-based speech synthesis will soon be able to generate speech of such good quality that it will be indistinguishable from natural speech - a form of so-called “deepfakes”. In the near future, it will be possible to put arbitrary messages into the mouth of any person. Without effective speech synthesis detection methods being available, there will be serious problems regarding the prevention and prosecution of fraud, phishing and disinformation.
SpeechTrust+ aims to explore a new generation of detectors that will reliably and sustainably detect AI-based speech synthesis and voice distortion. To achieve this, we will not only apply the same AI methods that have led to the tremendous improvements in speech synthesis, but also use methods from the field of audio forensics. This combination is meant to create a toolbox for the robust detection of both speech synthesis and voice distortion.