Multicodec Invdec Tampering Dataset

Dataset Content

Lossy compression is often involved in using audio data nowadays. Digital online music stores distribute their music using MP3 or AAC in order to decrease the amount of data needed for transmission, mobile devices store their recorded audio lossily compressed to save storage space.

Even after decoding of compressed data, traces of the lossy compression can be observed. The presented dataset consists of spoken utterances, that are partly combined from recordings including codec traces from MP3, AAC, HE-AAC, and MP3PRO. It can therefore be used to evaluate algorithms for codec detection and segmentation, framing grid offset detection, bitrate detection, and audio tampering detection. More details can be found in [1]. A study evaluating codec detection and tampering detection on this dataset is presented in [2].


[1] Daniel Gärtner, Luca Cuccovillo, Sebastian Mann, Patrick Aichroth, “A multi-codec audio dataset for codec analysis and tampering detection”. In Proceedings of the 54th AES Conference on Audio Forensics, London, 2014

[2] Daniel Gärtner, Christian Dittmar, Patrick Aichroth, Luca Cuccovillo, Sebastian Mann, Gerald Schuller, “Efficient Cross-Codec Framing Grid Analysis For Audio Tampering Detection”. In Proceedings of the 136th AES Convention, Berlin, 2014