Covers 1000 Dataset

by Chris Tralie

A ''cover song'' is a different version of the same song, usually performed by a different artist, and often with different instruments, recording settings, mixing/balance, tempo, and key. Although humans can readily identify cover songs, automatically identifying cover songs with a machine remains a challenging problem. One might wonder why it isn't possible to use an app like Shazam to automatically identify, say, a live recording of a song. As it turns out, the algorithm that powers Shazam looks for exact clips of recordings using an a technique known as audio fingerprinting. It is extremely good at its job, especially given a large database, but it is unable to detect re-renditions, even by the same artist. To help move research in automatic cover songs forward, we present a medium sized cover songs dataset consisting of a collection of features from 395 groups of cover songs, which have been checked by hand. We also have a live demo of our recent technique for identifying and aligning cover songs beat-by-beat, which currently achieves state of the art results on automatic cover song identification. Finally, we have implemented an algorithm to synthesize new cover songs in a fully automated fashinon from raw audio, and we present two tools (LoopDitty and GraphDitty) which we created to help design our algorithms.

Other Links

The Covers 80 Dataset A dataset with low quality audio consisting of 160 songs which are split into two disjoint subsets A and B, each with exactly one version of a pair of songs, for a total of 80 pairs. Mostly '80s and early '90s pop music
Kara1k Karaoke Songs Dataset A dataset with features for 2000 songs: 1000 originals and 1000 corresponding karaoke versions. Also a great dataset for singing voice analysis. A community project of annotations of cover songs which formed the basis of this dataset.
The Second Hand Songs Dataset Another dataset based off of annotations from, which is a subset of the Million Songs Dataset consiting of about 20,000 tracks with EchoNest features.
The Youtube Covers Dataset A collection chroma, CRP, and CENS features for 350 songs of various genres.