Music and Speech Detection

With a new software tool for automatic detection of music and speech sequences, Fraunhofer IDMT offers a highly effective solution to determine the exact amount of music and speech in radio and TV programs. The tool can be used to optimize broadcasting programs or provide accurate accounting for copyright agencies.

Less work

Using Fraunhofer IDMT’s new software tool, the amount of music and speech in radio and TV programs no longer needs to be determined by means of tedious manual work (typically personnel reading through audio content lists). The tool is able to detect and measure general audio categories (music, speech, music and speech combined, silence) both in live streams and in stored digital audio files.

High accuracy

Depending on the requirements, the resolution can be chosen as desired (from several seconds down to 100 milliseconds), so that optimal results can be achieved.

Easy integration

The tool is scalable and can easily be integrated with standard workflows and components. It can be used in production and live streaming environments, both online and offline.

Easy data export

The tool integrates with content management systems. For data output, users may choose between XML files, cue sheets, or other standard data export formats.

Potential Applications

  • copyright collecting / accounting: the tool may be used by copyright agencies to determine the amount of music in radio and TV programs; it provides the basis for efficient query of programs and files, as music segments can be filtered out before concrete pieces of music are searched for by title
  • program monitoring: radio and TV program managers may use the tool to optimize their programs regarding the percentage of music


  • Supported audio categories: music, speech, music and speech combined, silence (more categories available)
  • Data export by means of XML files, cue sheets, DDEX etc.
  • Graphical representation in charts (total, percentage, timestamps)
  • High resolution (e.g. 100 ms) for detailed views or low resolution (e.g. 10 s) for overviews
  • Integration with CMS


  • Annual license for using the core components of the tool
  • Volume license for using the tool with a number of servers or stations

This might also be of interest

Research topic

Audio and Visual Content Analysis

Extracting meaningful data from audiovisual content


Research topic

Automatic Music Analysis