Audio and Visual Content Analysis

Extracting meaningful data from audiovisual content

Research activities focus on the development of technologies for comprehensive analysis and annotation of audio and video contents using signal analysis and machine learning. The extraction of metadata from media data provides the foundation for numerous applications such as automatic tagging, content-based search, and recommendation systems.

News and upcoming events


Press Release / 12.4.2024

Advertising monitoring for SWR radio programs

Our audio matching replaces manual checking of broadcast commercials


Event / 12.3.2024

DataTech 2024

Join our presentation »Digital Traces: Verification of Audio-Visual Content« ath the Data Technology Seminar 2024 – EBU's annual flagship event for practitioners in data and AI for media.



Workshop Digital Broadcasting

The next Workshop Digital Broadcasting will take place in fall 2024.

Analyzing media content and making it accessible

The use and exploitation of audiovisual content depend on the availability of meaningful metadata (data describing data). They provide the basis for locating, organizing, and classifying specific content, as well as implementing recommendation systems. Technologies for the automatic extraction of metadata are therefore crucial to make media content truly accessible and usable.

Multimodal analysis and annotation of media data

The development of technologies for the automatic analysis and annotation of audiovisual data requires a solid understanding of signal processing and machine learning, along with a good comprehension of the underlying requirements.

Another challenge lies in multimodal analysis and orchestration: extracting metadata from audio, video, and image files involves a variety of processes ranging from preprocessing to feature extraction and classification. Different methods and technologies are employed, requiring flexible integration and orchestration. The integration of heterogeneous data from different sources and formats also requires the selection or development of suitable data models and metadata standards.  Media archives often deal with large volumes of data, imposing specific requirements on system architecture, efficiency, and the optimization of the algorithms used.

Furthermore, we are involved in metadata standards, as well as the integration and orchestration of analysis components. We also address privacy concerns and other aspects of trustworthy AI, aiming to provide comprehensive solutions for specific application requirements.

Research areas Audio and Visual Content Analysis


Automatic Music Analysis

The focus is on the recognition of musical features such as pitch, rhythm, timbre, and genre, extending to musical transcription. The technologies enable music classifications, similarity analysis between musical pieces, and the detection of specific sound events and acoustic environments.


Video Analysis

In visual analysis, the focus is on analyzing faces in videos. Through facial recognition and tracking, human faces can be analyzed and identified. Additionally, image processing techniques and machine learning are used to detect and classify animals in videos.

Provenance Analysis and Matching

The detection of recurring patterns, reuse of media content and transformation steps between different content provides insights into their origin and processing.

Multimodal and Crossmodal Analysis

To achieve optimal results, the methods described can be combined in many use cases or complemented with other analysis methods, such as metadata analysis. An important requirement for this is suitable interfaces, a common data model and the possibility of flexible orchestration and configuration of the analysis components used.

These technologies are applied particularly for tagging and indexing A/V archives, recommendation systems, program analysis, content tracking, and rights management. They are also used for audio-visual biodiversity measurement and to support disinformation detection.


Research project


Center of excellence for AI in media – Our contributions: Audio forensics, audio provenance analysis, music analysis, privacy and recommendation systems


Research project


Multi-modal AI-driven technologies for automatic construction site monitoring


Research project


Biodiversity identification software to automatically search through single images, video and audio recordings for sequences involving great apes. 


Research project


Development of a digital platform to make the audio-visual cultural heritage of European cities accessible


Research project


Framework for multimedia search that combines "human and social computation" and content analysis


Research project


Platform for multimodal and context-based analysis, into which a wide variety of analysis components for different media types can be integrated


  • Media Analytics: Dienstleistungen zur Analyse und Annotation von Medieninhalten
  • Evaluation (Visual AI Assessment): Technische Evaluation von Verfahren, Komponenten und Systemen im Bereich Audio- und Videoanalyse

Publication Type
2022 Construction-sAIt: Multi-modal AI-driven technologies for construction site monitoring
Abeßer, Jakob; Loos, Alexander; Sharma, Prachi
Conference Paper
2016 A workflow for cross media recommendations based on linked data analysis
Aichroth, P.; Berndl, E.; Weißgerber, T.; Kosch, H.; Köllmer, T.
Conference Paper
2015 MICO - Media in Context
Aichroth, P.; Kurz, T.; Stadler, H.; Drewes, F.; Björklund, J.; Schlegel, K.; Berndl, E.; Perez, A.; Bowyer, A.; Volpini, A.; Weigel, C.
Conference Paper
2011 Automated detection of errors and quality issues in audio-visual content
Kühhirt, U.; Paduschek, R.; Nowak, S.
Conference Paper
2008 Personal television: A crossmodal analysis approach
Dunker, Peter; Gruhne, Matthias; Sturtz, S.
Conference Paper
Diese Liste ist ein Auszug aus der Publikationsplattform Fraunhofer-Publica

This list has been generated from the publication platform Fraunhofer-Publica