Audio and Visual Content Analysis

Extracting meaningful data from audiovisual content

Research activities focus on the development of technologies for comprehensive analysis and annotation of audio and video contents using signal analysis and machine learning. The extraction of metadata from media data provides the foundation for numerous applications such as automatic tagging, content-based search, and recommendation systems.

News and upcoming events

 

Event / 13.9.2024

Meet us at IBC 2024

At IBC 2024 we will present the our face and speaker analysis for media presence measurement.

 

Workshop / 5.11.2024

WSDB 2024

On November 5 and 6, 2024, we are organizing the 18th Workshop for Digital Broadcasting and Media 2024 in Erfurt.

 

Press Release / 12.4.2024

Advertising monitoring for SWR radio programs

Our audio matching replaces manual checking of broadcast commercials

Analyzing media content and making it accessible

The use and exploitation of audiovisual content depend on the availability of meaningful metadata (data describing data). They provide the basis for locating, organizing, and classifying specific content, as well as implementing recommendation systems. Technologies for the automatic extraction of metadata are therefore crucial to make media content truly accessible and usable.

Multimodal analysis and annotation of media data

The development of technologies for the automatic analysis and annotation of audiovisual data requires a solid understanding of signal processing and machine learning, along with a good comprehension of the underlying requirements.

Another challenge lies in multimodal analysis and orchestration: extracting metadata from audio, video, and image files involves a variety of processes ranging from preprocessing to feature extraction and classification. Different methods and technologies are employed, requiring flexible integration and orchestration. The integration of heterogeneous data from different sources and formats also requires the selection or development of suitable data models and metadata standards.  Media archives often deal with large volumes of data, imposing specific requirements on system architecture, efficiency, and the optimization of the algorithms used.

Furthermore, we are involved in metadata standards, as well as the integration and orchestration of analysis components. We also address privacy concerns and other aspects of trustworthy AI, aiming to provide comprehensive solutions for specific application requirements.

Research areas Audio and Visual Content Analysis

 

Automatic Music Analysis

The focus is on the recognition of musical features such as pitch, rhythm, timbre, and genre, extending to musical transcription. The technologies enable music classifications, similarity analysis between musical pieces, and the detection of specific sound events and acoustic environments.

 

Video Analysis

In visual analysis, the focus is on analyzing faces in videos. Through facial recognition and tracking, human faces can be analyzed and identified. Additionally, image processing techniques and machine learning are used to detect and classify animals in videos.

Provenance Analysis and Matching

The detection of recurring patterns, reuse of media content and transformation steps between different content provides insights into their origin and processing.

Multimodal and Crossmodal Analysis

To achieve optimal results, the methods described can be combined in many use cases or complemented with other analysis methods, such as metadata analysis. An important requirement for this is suitable interfaces, a common data model and the possibility of flexible orchestration and configuration of the analysis components used.

These technologies are applied particularly for tagging and indexing A/V archives, recommendation systems, program analysis, content tracking, and rights management. They are also used for audio-visual biodiversity measurement and to support disinformation detection.

 

Research project

AI4Media

Center of excellence for AI in media – Our contributions: Audio forensics, audio provenance analysis, music analysis, privacy and recommendation systems

 

Research project

Construction-sAIt

Multi-modal AI-driven technologies for automatic construction site monitoring

 

Research project

SAISBECO

Biodiversity identification software to automatically search through single images, video and audio recordings for sequences involving great apes. 

 

Research project

iMediaCities

Development of a digital platform to make the audio-visual cultural heritage of European cities accessible

 

Research project

CUBRIK

Framework for multimedia search that combines "human and social computation" and content analysis

 

Research project

MiCO

Platform for multimodal and context-based analysis, into which a wide variety of analysis components for different media types can be integrated

Services

  • Media Analytics: Dienstleistungen zur Analyse und Annotation von Medieninhalten
  • Evaluation (Visual AI Assessment): Technische Evaluation von Verfahren, Komponenten und Systemen im Bereich Audio- und Videoanalyse

Jahr
Year
Titel/Autor:in
Title/Author
Publikationstyp
Publication Type
2024 Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol
Apostolidis, Konstantinos; Abeßer, Jakob; Cuccovillo, Luca; Vasileios, Mezaris
Konferenzbeitrag
Conference Paper
2022 Construction-sAIt: Multi-modal AI-driven technologies for construction site monitoring
Abeßer, Jakob; Loos, Alexander; Sharma, Prachi
Konferenzbeitrag
Conference Paper
2016 A workflow for cross media recommendations based on linked data analysis
Aichroth, P.; Berndl, E.; Weißgerber, T.; Kosch, H.; Köllmer, T.
Konferenzbeitrag
Conference Paper
2015 MICO - Media in Context
Aichroth, P.; Kurz, T.; Stadler, H.; Drewes, F.; Björklund, J.; Schlegel, K.; Berndl, E.; Perez, A.; Bowyer, A.; Volpini, A.; Weigel, C.
Konferenzbeitrag
Conference Paper
2011 Automated detection of errors and quality issues in audio-visual content
Kühhirt, U.; Paduschek, R.; Nowak, S.
Konferenzbeitrag
Conference Paper
2008 Personal television: A crossmodal analysis approach
Dunker, Peter; Gruhne, Matthias; Sturtz, S.
Konferenzbeitrag
Conference Paper
Diese Liste ist ein Auszug aus der Publikationsplattform Fraunhofer-Publica

This list has been generated from the publication platform Fraunhofer-Publica