Cover Song Synthesis by Analogy:
Supplementary Material

By Chris Tralie

Click here to view the technical paper

Abstract

In this work, we pose and address the following ``cover song analogies'' problem: given a song A by artist 1 and a cover song A' of this song by artist 2, and given a different song B by artist 1, synthesize a song B' which is a cover of B in the style of artist 2. Normally, such a polyphonic style transfer problem would be quite challenging, but we show how the cover songs example constrains the problem, making it easier to solve. First, we extract the longest common beat-synchronous subsequence between A and A', and we time stretch the corresponding beat intervals in A' so that they align with A. We then derive a version of joint 2D convolutional NMF, which we apply to the constant-Q spectrograms of the synchronized segments to learn a translation dictionary of sound templates from A to A'. Finally, we apply the learned templates as filters to the song B, and we mash up the translated filtered components into the synthesized song B' using audio mosaicing. We showcase our algorithm on several examples, including a synthesized cover version of Michael Jackson's ``Bad'' by Alien Ant Farm, learned from the latter's ``Smooth Criminal'' cover.

Examples




Alien Ant Farm & Michael Jackson Example 1 ("Bad")

Synthesizing Alien Ant Farm Cover of Michael Jackson's "Bad"

Analogies Results

A (Michael Jackson "Smooth Criminal") A' (Alien Ant Farm "Smooth Criminal")
B (Michael Jackson "Bad") B' (Synthesized Alien Ant Farm "Bad") This is our main result


1) Synchronization of A and A'

Changing the speed of A' locally so that it aligns with A

Synced (put on your headphones...A in left ear, A' in right ear)


2) W1 and W2 Joint Factorization

Ws and Hs (matfile)

3) Phase Retrieval on W1 and W2 Components

We can listen to the components of W1 and W2 by applying the Griffin Lim algorithm in the CQT domain and inverting at the end. This can help us gain insights into what the filters are actually picking up on.

W11 (MJ Guitar) W21 (AAF Guitar)
W12 (MJ percussion + some guitar bleed) W22 (AAF percussion)
W13 (MJ Bass track) W23 (AAF Bass track)


4) Filtering And Inverting CQTs

As described in the paper, we can use the learned decomposition to come up with masks on the original audio and separate it into corresponding components between A and A'. This creates our "per-instrument translation dictionary," which we use to synthesize cover songs.

A1 A'1
A2 A'2
A3 A'3


5) Musaicing

Now that we have the translation dictionaries, we can split B up into components B1, B2, and B3 using W1, and we can use templates from A1, A2, and A3, respectively, to form them (using Driedger's audio musaicing). Based on the coefficients we use from A1, A2, and A3, we can then translate them into grains A'1, A'2, and A'3, which form the final translations B'1, B'2, and B'3

B1 B2 B3
B1 Driedger B2 Driedger B3 Driedger
B'1 B'2 This track picks up on a lot of nice scream embellishments from Alien Ant Farm B'3





Alien Ant Farm & Michael Jackson Example 2
("Wanna Be Startin Something")

Synthesizing Alien Ant Farm Cover of Michael Jackson's "Wanna Be Startin Something"

Analogies Results

A (Michael Jackson "Smooth Criminal") A' (Alien Ant Farm "Smooth Criminal")
B (Michael Jackson "Wanna Be Startin Something") B' (Synthesized Alien Ant Farm "Bad")


1) Steps 1-4

We omit these steps since we use the same dictionary that we used in the MJ/AAF "Bad" Example.

5) Musaicing

B1 B2 B3
B1 Driedger B2 Driedger B3 Driedger
B'1 B'2 B'3






Eurythmics & Marilyn Manson

Synthesizing Marilyn Manson Cover of Eurythmics' "Who's That Girl." There is a more extreme tempo change between cover versions in this example than in the others.

A (Eurythmics "Sweet Dreams") A' (Marilyn Manson "Sweet Dreams")
B (Eurythmics "Who's That Girl") B' (Synthesized Marilyn Manson "Who's That Girl")


1) Synchronization of A and A'

Changing the speed of A' locally so that it aligns with A

Synced (put on your headphones...A in left ear, A' in right ear)


2) W1 and W2 Joint Factorization

Ws and Hs (matfile)

3) Phase Retrieval on W1 and W2 Components

W11 W21
W12 W22
W13 (Main Eurythmics synth filter) W23 (Marilyn Manson guitar filter)


4) Filtering And Inverting CQTs

A1 A'1
A2 A'2
A3 A'3


5) Musaicing

B1 B2 B3
B1 Driedger B2 Driedger (this was unable to represent the new triangle instrument, since this was never in the original song A) B3 Driedger
B'1 B'2 B'3