Audio and Visual Content Analysis

Research

Analyzing media content and making it accessible

The use and exploitation of audiovisual content depend on the availability of meaningful metadata (data describing data). They provide the basis for locating, organizing, and classifying specific content, as well as implementing recommendation systems. Technologies for the automatic extraction of metadata are therefore crucial to make media content truly accessible and usable.

Multimodal analysis and annotation of media data

The development of technologies for the automatic analysis and annotation of audiovisual data requires a solid understanding of signal processing and machine learning, along with a good comprehension of the underlying requirements.

Another challenge lies in multimodal analysis and orchestration: extracting metadata from audio, video, and image files involves a variety of processes ranging from preprocessing to feature extraction and classification. Different methods and technologies are employed, requiring flexible integration and orchestration. The integration of heterogeneous data from different sources and formats also requires the selection or development of suitable data models and metadata standards. Media archives often deal with large volumes of data, imposing specific requirements on system architecture, efficiency, and the optimization of the algorithms used.

Furthermore, we are involved in metadata standards, as well as the integration and orchestration of analysis components. We also address privacy concerns and other aspects of trustworthy AI, aiming to provide comprehensive solutions for specific application requirements.

Research areas Audio and Visual Content Analysis

Automatic Music Analysis

The focus is on the recognition of musical features such as pitch, rhythm, timbre, and genre, extending to musical transcription. The technologies enable music classifications, similarity analysis between musical pieces, and the detection of specific sound events and acoustic environments.

Automatic Music Analysis

Video Analysis

In visual analysis, the focus is on analyzing faces in videos. Through facial recognition and tracking, human faces can be analyzed and identified. Additionally, image processing techniques and machine learning are used to detect and classify animals in videos.

Video analysis

Provenance Analysis and Matching

The detection of recurring patterns, reuse of media content and transformation steps between different content provides insights into their origin and processing.

Multimodal and Crossmodal Analysis

To achieve optimal results, the methods described can be combined in many use cases or complemented with other analysis methods, such as metadata analysis. An important requirement for this is suitable interfaces, a common data model and the possibility of flexible orchestration and configuration of the analysis components used.

These technologies are applied particularly for tagging and indexing A/V archives, recommendation systems, program analysis, content tracking, and rights management. They are also used for audio-visual biodiversity measurement and to support disinformation detection.

Projects and activities

Research project

AI4Media

Center of excellence for AI in media – Our contributions: Audio forensics, audio provenance analysis, music analysis, privacy and recommendation systems

AI4Media

Research project

Construction-sAIt

Multi-modal AI-driven technologies for automatic construction site monitoring

Construction-sAIt

Research project

SAISBECO

Biodiversity identification software to automatically search through single images, video and audio recordings for sequences involving great apes.

SAISBECO

Research project

iMediaCities

Development of a digital platform to make the audio-visual cultural heritage of European cities accessible

iMediaCities

Research project

CUBRIK

Framework for multimedia search that combines "human and social computation" and content analysis

CUbRIK

Research project

MiCO

Platform for multimodal and context-based analysis, into which a wide variety of analysis components for different media types can be integrated

MICO

Range of services

Services

Media Analytics: Dienstleistungen zur Analyse und Annotation von Medieninhalten
Evaluation (Visual AI Assessment): Technische Evaluation von Verfahren, Komponenten und Systemen im Bereich Audio- und Videoanalyse

Publications

Datatsets

Within the last years Fraunhofer IDMT has compiled audio datasets for different research areas like the detection of instruments, fingerings or performance analysis. These datasets have been presented in several scientific publications on international conferences and shall serve the scientific community as potential benchmarks for comparison experiments.

Audio and Visual Content Analysis

Extracting meaningful data from audiovisual content

Research areas "Audio and Visual Content Analysis"

News and upcoming events

Advertising monitoring for SWR radio programs

DataTech 2024

Workshop Digital Broadcasting

Tabbed contents

Research

Analyzing media content and making it accessible

Multimodal analysis and annotation of media data

Research areas Audio and Visual Content Analysis

Automatic Music Analysis

Video Analysis

Provenance Analysis and Matching

Multimodal and Crossmodal Analysis

Projects and activities

AI4Media

Construction-sAIt

SAISBECO

iMediaCities

CUBRIK

MiCO

Range of services

Services

Publications

Datatsets

Overview Datasets

Contact Press / Media

Dr.-Ing. Uwe Kühhirt

Contact Press / Media

Hanna Lukashevich