Interview with Dr.-Ing. Stefan Goetze

December 09, 2019

Corporate documentation is an important task, but it uses up valuable time. Against this background, contact-free speech recognizers from the Fraunhofer IDMT can offer real added value for industrial customers – especially in noisy or sterile environments. Interview with Dr.-Ing. Stefan Götze, Head of »Automatic Speech Recognition«, on the advantages of speech recognition systems.

Dr Götze, you’re working on human-technology interaction via speech. What advantages do you see for industrial contexts?

For industrial processes, where the documentation of information means interrupting the work process, contact-free speech recognition systems offer important added value in terms of cost and improved safety in the workplace. The voice control of robots and plants, or even just a lamp, also generates added value for our customers.

You say that speech recognition saves time and costs. Why?

Speech recognition systems create a basis for concentrating on what are really the core tasks and ensuring that necessary documentation takes as little time as possible.

»It’s especially important to us that the systems we develop are perceived as a clear reduction in workload.«

 

This includes perfect usability that we can customize to our client’s individual requirements and the reliable recognition of speech even in noisy surroundings or at a long distance from the microphone.

What’s special about speech recognition in Oldenburg?

Background noise and ambient conditions can influence recognition performance. By contrast, the human brain can cope very well with interference. That’s why we’re looking at the latest scientific findings in the area of basic psychoacoustic and psychophysical research in order to develop algorithms with a minimum false recognition rate.

What does this mean for technology development?

In order to be able to react as flexibly as possible to acoustic requirements, various signal recording and enhancement technologies were designed for modular use in hardware and software. For example, it’s possible to achieve optimal recognition performance through an intelligent positioning of microphones in cases where there is a lot of background noise or ambient reverberation in industrial scenarios.

Customers who come to you don’t need an »off-the-peg« speech recognizer. What do they get instead?

The required vocabulary and embedding in existing applications and end devices can be defined just as individually as the technical assembly. We’re able to produce systems with just a few commands to control simple technical systems, but we also build dialogue-oriented robots or chatbot systems with very large vocabularies. An important added value of this development is that a lot of the data processing takes place on the sensor. This makes it possible to realize applications in remote areas or when a plant or infrastructure is a long distance away.

How should the customer envisage the implementation of this technology in his company?

In principle, our speech recognition and control system can be adapted to any application, no matter how individual, whether the documentation of process steps or simple commands for robots with just a few words or a complex, dialogue-based chatbot. Our technology can be used in the widest variety of applications – from speech recognition in smartphones and smart home systems to security applications on commercial premises and in production environments. Since our developments are platform-independent, we and the customer have at our disposal all the freedom imaginable for the interfaces and integration into existing applications. Thanks to our vast experience with safety-critical requirements, the system can be used locally and without an internet connection, thus fulfilling the most stringent demands on data security.