Lossy compression is often involved in using audio data nowadays. Digital online music stores distribute their music using MP3 or AAC in order to decrease the amount of data needed for transmission, mobile devices store their recorded audio lossily compressed to save storage space.
Even after decoding of compressed data, traces of the lossy compression can be observed. The presented dataset consists of spoken utterances, that are partly combined from recordings including codec traces from MP3, AAC, HE-AAC, and MP3PRO. It can therefore be used to evaluate algorithms for codec detection and segmentation, framing grid offset detection, bitrate detection, and audio tampering detection. More details can be found in . A study evaluating codec detection and tampering detection on this dataset is presented in .