Audio and Visual Content Analysis

Extracting meaningful data from audiovisual content

Research activities focus on the development of technologies for comprehensive analysis and annotation of audio and video contents using signal analysis and machine learning. The extraction of metadata from media data provides the foundation for numerous applications such as automatic tagging, content-based search, and recommendation systems.

News and upcoming events

 

Event / 12.3.2024

DataTech 2024

Join our presentation »Digital Traces: Verification of Audio-Visual Content« ath the Data Technology Seminar 2024 – EBU's annual flagship event for practitioners in data and AI for media.

 

Event

Workshop Digital Broadcasting

The next Workshop Digital Broadcasting will take place in fall 2024.

 

New project

A musical question-and-answer game with AI

Development of an AI-based composition app: Fraunhofer IDMT is a partner in Thuringian research project

Analyzing media content and making it accessible

The use and exploitation of audiovisual content depend on the availability of meaningful metadata (data describing data). They provide the basis for locating, organizing, and classifying specific content, as well as implementing recommendation systems. Technologies for the automatic extraction of metadata are therefore crucial to make media content truly accessible and usable.

Multimodal analysis and annotation of media data

The development of technologies for the automatic analysis and annotation of audiovisual data requires a solid understanding of signal processing and machine learning, along with a good comprehension of the underlying requirements.

Another challenge lies in multimodal analysis and orchestration: extracting metadata from audio, video, and image files involves a variety of processes ranging from preprocessing to feature extraction and classification. Different methods and technologies are employed, requiring flexible integration and orchestration. The integration of heterogeneous data from different sources and formats also requires the selection or development of suitable data models and metadata standards.  Media archives often deal with large volumes of data, imposing specific requirements on system architecture, efficiency, and the optimization of the algorithms used.

Furthermore, we are involved in metadata standards, as well as the integration and orchestration of analysis components. We also address privacy concerns and other aspects of trustworthy AI, aiming to provide comprehensive solutions for specific application requirements.

Research areas Audio and Visual Content Analysis

 

Automatic Music Analysis

The focus is on the recognition of musical features such as pitch, rhythm, timbre, and genre, extending to musical transcription. The technologies enable music classifications, similarity analysis between musical pieces, and the detection of specific sound events and acoustic environments.

 

Video Analysis

In visual analysis, the focus is on analyzing faces in videos. Through facial recognition and tracking, human faces can be analyzed and identified. Additionally, image processing techniques and machine learning are used to detect and classify animals in videos.

Provenance Analysis and Matching

The detection of recurring patterns, reuse of media content and transformation steps between different content provides insights into their origin and processing.

Multimodal and Crossmodal Analysis

To achieve optimal results, the methods described can be combined in many use cases or complemented with other analysis methods, such as metadata analysis. An important requirement for this is suitable interfaces, a common data model and the possibility of flexible orchestration and configuration of the analysis components used.

These technologies are applied particularly for tagging and indexing A/V archives, recommendation systems, program analysis, content tracking, and rights management. They are also used for audio-visual biodiversity measurement and to support disinformation detection.

 

Research project

AI4Media

Center of excellence for AI in media – Our contributions: Audio forensics, audio provenance analysis, music analysis, privacy and recommendation systems

 

Research project

Construction-sAIt

Multi-modal AI-driven technologies for automatic construction site monitoring

 

Research project

SAISBECO

Biodiversity identification software to automatically search through single images, video and audio recordings for sequences involving great apes. 

 

Research project

iMediaCities

Development of a digital platform to make the audio-visual cultural heritage of European cities accessible

 

Research project

CUBRIK

Framework for multimedia search that combines "human and social computation" and content analysis

 

Research project

MiCO

Platform for multimodal and context-based analysis, into which a wide variety of analysis components for different media types can be integrated

Services

  • Media Analytics: Dienstleistungen zur Analyse und Annotation von Medieninhalten
  • Evaluation (Visual AI Assessment): Technische Evaluation von Verfahren, Komponenten und Systemen im Bereich Audio- und Videoanalyse

Datatsets

Within the last years Fraunhofer IDMT has compiled audio datasets for different research areas like the detection of instruments, fingerings or performance analysis. These datasets have been presented in several scientific publications on international conferences and shall serve the scientific community as potential benchmarks for comparison experiments.