New audio analysis methods from the Fraunhofer Institute for Digital Media Technology IDMT help audio professionals to evaluate speech intelligibility in an objective way and in so doing make an important contribution to the optimal audio mix. At the 31st »Tonmeistertagung« trade exhibition on 3-6 November 2021 in Düsseldorf, IDMT’s Oldenburg Branch for Hearing, Speech and Audio Technology HSA will present its portfolio of solutions.
In media productions or broadcasting, deciding whether certain parts of speech are sufficiently intelligible for listeners is often a subjective matter. Each individual, however, has different listening preferences. In addition, it can make sense, depending on the target group, to take demographic changes and existing hearing impairments into consideration. Fraunhofer IDMT’s methods, which are based on machine learning, provide audio professionals with a reliable foundation for their decision-making. Software solutions range from pure evaluation to automated adjustment of intelligibility in real time. The experts at IDMT’s Oldenburg branch see great potential in the possibility to personalize intelligibility in streaming services or even live on set or in sound engineering, in addition to applications in film, radio and television productions.
First the analysis…
Fraunhofer IDMT’s speech intelligibility solutions are always based on the analysis of the audio signal. Machine learning methods are used to automatically detect signals with speech components, the quality and/or intelligibility of which is then evaluated automatically. In practice, the difference in volume between speech and background noise (SNR = signal-to-noise ratio) is often used as a benchmark. The team led by Head of Group Dr. Jan Rennies-Hochmuth at the Oldenburg branch goes even further. He explains: »By means of algorithms based on AI, we calculate the actual listening effort that the listener has to exert in order to hear the already mixed signal. In this way, we can facilitate a far more reliable evaluation of the mix and increase the probability that – following optimal evaluation – listeners will be able to understand the spoken word much better. This is because two signals with the same SNR can differ significantly in listening effort and intelligibility, depending on type of background noise or clarity of pronunciation.«
…then the improvement
If speech intelligibility is poor, audio professionals must either work on the sound mix or adjust recording conditions. This is where another Fraunhofer IDMT solution comes in. It calculates the adjustments needed for better speech intelligibility, supplies corresponding instructions and, if necessary, even carries out the improvements automatically. For this, specially developed algorithms for source separation are used. Dialogues are isolated and accentuated – even in situations with complex background acoustics such as music or sound effects. Since this can already be achieved with minimal delay at the signal-processing stage, IDMT’s solutions can be used not only in preprocessing but also during broadcasting or in the listener’s end appliances. Here, another technology from IDMT’s Oldenburg branch can benefit the listener. Adaptive signal processing incorporates loud ambient noise in situ and adjusts the audio signal accordingly for optimal intelligibility, without having to increase the volume.
Which speaker is important?
In audio mixes, it is not uncommon that several people speak at the same time – but this sometimes compromises speech intelligibility. Fraunhofer IDMT has a solution at hand for this as well, which will of course also be demonstrated on its stand at the »Tonmeistertagung« exhibition. The reliable detection and separation of different speakers is facilitated by what is known as »voice filtering«, which generates the acoustic fingerprint of the person speaking within just a few seconds in order to extract it later from a mix of several speakers.
What can you expect at Fraunhofer IDMT’s stand at TMT31?
The »Tonmeistertagung« is a trade exhibition for audio professionals that takes place every two years and presents current trends and developments in the sector. Visit Fraunhofer IDMT’s stand from 03.-06.11.2021 for a live demonstration of its software solutions in the field of speech intelligibility. The team will be happy to explain how you can integrate them into your application or product. Hannah Baumgartner, project manager at IDMT’s Oldenburg branch and board member of the Verband Deutscher Tonmeister (VDT) (the German association for audio professionals), will also curate and chair a session on »Speech Intelligibility in Broadcasting and Film« from 10:00 am to 12:40 pm on 05.11.2021.
Hearing, Speech and Audio Technology HSA at Fraunhofer Institute for Digital Media Technology IDMT in Oldenburg
Founded in 2008 as a project group, the Fraunhofer Institute for Digital Media Technology IDMT’s Branch for Hearing, Speech and Audio Technology HSA stands for market-oriented research and development with a focus on the following areas:
- Speech and event recognition
- Sound quality and speech intelligibility
- Mobile neurotechnology and systems for networked healthcare
With in-house expertise in the development of hardware and software systems for audio system technology and signal enhancement, over 90 employees at the Oldenburg site are responsible for transferring scientific findings into practical, customer-oriented solutions.
Through scientific cooperation, the institute is closely linked to the Carl von Ossietzky University, Jade University of Applied Sciences, the University of Applied Sciences Emden / Leer and other institutions in the field of hearing research. Fraunhofer IDMT is a partner in the »Hearing4all« cluster of excellence.