Seeing speech

News /

New algorithms from Fraunhofer IDMT form the basis for the »Dialogue Detection« in Steinberg Media Technologies’ latest version of its audio post-production software Nuendo. The software reliably recognises speech components in the audio track and in so doing enables audio professionals to easily separate passages with and without speech into different tracks. Fraunhofer IDMT supplied algorithms for measuring, evaluating and displaying speech intelligibility for the previous version of Nuendo too.

© Steinberg Media Technologies GmbH
The »Dialog Detection« in Steinbergs Nuendo 12: Algorithms of Fraunhofer IDMT in Oldenburg reliably identify speech activity in the presence of background noises.

Identifying passages with and without speech components solely on the basis of the audio level can be a tedious task for professional sound engineers. To detect whether an audio passage is spoken word or merely background noise, they are obliged to listen to each one during editing. In cooperation with the Fraunhofer Institute for Digital Media Technology IDMT in Oldenburg, Steinberg Media Technologies GmbH wants to make professionals’ work in the areas of sound design, dialogue editing and speech synchronisation easier. To this end, Steinberg has integrated the »Dialogue Detection« feature in the latest update of its Nuendo digital audio workstation.

Spotlight on dialogue processing

The new features in Nuendo 12 focus on the recording and editing of dialogue. »This especially brings to the fore the requirements of Nuendo users who, for example, need to concentrate more on speech when dubbing products and producing voiceovers. This is particularly important when creating content for streaming services,« says Timo Wildenhain, Head of ProAudio at Steinberg. For this, »Dialogue Detection« relies on technologies from Fraunhofer IDMT in Oldenburg. Algorithms based on machine learning (neural networks) detect speech activity in the audio signal independently of background noise. Sound engineers can listen to these passages and, if required, have parts without speech split automatically into different tracks. They can then start the actual editing process comfortably and conveniently with a separate dialogue track.

Multiple applications for speech activity detection

To reliably identify speech activity in the presence of background noise, Fraunhofer IDMT brought in a lot of different data to train its »Speech Activity Detection« (SAD) algorithm used in the feature. »Our SAD algorithms are found in a variety of applications. As an independent feature, they can noticeably improve audio professionals’ workflow. In addition, they serve in other Fraunhofer IDMT solutions as a pre-processing tool for our in-house speech and speaker recognition, as noise cancellation algorithms or privacy filters,« explains Christian Rollwage, Head of Audio Signal Enhancement at the Oldenburg Branch for Hearing, Speech and Audio Technology HSA. Whether in the smart speaker in the living room at home, in speech-based machine control on the factory floor or in voice documentation in quality assurance: SAD can be used to ensure that non-speech components are filtered out before passing the audio to the next processing steps, or that speech is not recorded in the first place, thus protecting users’ privacy, for example in public places.

Successful cooperation between Steinberg and Fraunhofer IDMT

Steinberg already used Fraunhofer IDMT’s technologies in the previous version, Nuendo 11, to measure, evaluate and display speech intelligibility. The »Intelligibility Meter« gave audio professionals a tool to keep speech as intelligible as possible in the final mix and also to take demographic change, with its associated hearing losses, into account.


Hearing, Speech and Audio Technology HSA at Fraunhofer IDMT in Oldenburg

Founded in 2008 by Prof. Dr. Dr. Birger Kollmeier and Dr. Jens-E. Appell, the Fraunhofer Institute for Digital Media Technology IDMT’s Branch for Hearing, Speech and Audio Technology HSA stands for market-oriented research and development with a focus on the following areas:

  • Speech and event recognition
  • Sound quality and speech intelligibility
  • Mobile neurotechnology and systems for networked healthcare

With in-house expertise in the development of hardware and software systems for audio system technology and signal enhancement, over 100 employees at the Oldenburg site are responsible for transferring scientific findings into practical, customer-oriented solutions.

Through scientific cooperation, the institute is closely linked to the Carl von Ossietzky University, Jade University of Applied Sciences, and the University of Applied Sciences Emden/Leer. Fraunhofer IDMT is a partner in the »Hearing4all« cluster of excellence.