Amsterdam, The Netherlands  /  16. Juni 2026

5th ACM International Workshop on Multimedia AI against Disinformation (MAD’26)

Am 16. Juni 2026 findet in Amsterdam, Niederlande, der 5. ACM International Workshop on Multimedia AI against Disinformation (MAD’26) im Rahmen der ACM International Conference on Multimedia Retrieval (ICMR’26) statt. I Mittelpunkt stehen Beiträge zu verschiedenen Aspekten der KI-gestützten Erkennung, Analyse und Bekämpfung von Desinformation.

Das Fraunhofer IDMT unterstützt die Organisation des Workshops und stellt eigene Beiträge zum Thema vor.

Detecting Audio-Text Decontextualization through Entailment and Semantic Analysis

Milica Gerhardt, Luca Cuccovillo, Patrick Aichroth

Audio-text decontextualization is a form of real-world misinformation in which genuine audio recordings – speech excerpts, news clips, interviews – are detached from their authentic context and paired with misleading textual narratives. Addressing it in practice requires both audio provenance analysis and context analysis: provenance retrieves candidate source recordings, while context analysis determines whether the recovered source supports the narrative attached to the post. This paper presents three context-analysis pipelines able to address this issue and their cascade combinations, and evaluates them on the M3A dataset alongside four audio-language baselines. We show that a substantial fraction of M3A manipulations are fundamentally undetectable from audio-text content alone, and that on the subset where detection is possible our best pipelines reach 0.73 accuracy on Named Entity Manipulation (NEM) and 0.92 on Multimodal Misalignment (MM) audio swap. Building on these findings, we formulate an operational workflow for real-world investigations and demonstrate it on three case studies, which also motivate a lightweight linguistic middle layer for conditional and modal/hedging framing drops. This leads to two practical deployment recommendations: (1) a fast bulk-screening pipeline that flags context-stripping attacks via entailment failure; and (2) a large language model (LLM)-based deep-verification pipeline for the most suspicious cases, capable of explicit reasoning about framing shifts.