Loop Ditty Geometric Music Visualizer
by Chris Tralie
Supported by an NSF Graduate Fellowship NSF DGF 1106401 and an NSF Research Training Grant NSF-DMS 1045133.
Special thanks to Geometric Data Analytics, Inc. for inspiration and the Duke Data Expeditions Program for motivation.
Paste SoundCloud URL Below To Begin

OR...


Load Your Own Audio File

OR...


Try A Precomputed Example



Waiting for Input...


Display Parameters
These are aesthetic animation parameters which can be updated quickly without any long re-computation
Display Time Edges Choose whether or not line segments should be drawn between points which are adjacent in time in the song.
Feature Choices
Each point in this animation is a 3 dimensional projection of a bunch of numbers which describe perceptual aspects of a chunk of music. As the sound changes, these numbers also change, and so the points move through space. You can choose to include or exclude certain numbers from contributing to the position of the points by checking and unchecking the boxes below
MFCC MFCC stands for "Mel-Frequency Cepstral Coefficient." These 12 numbers model a highly smoothed version of the frequency spectrum of a chunk of audio. They have been shown to work well for voice, musical artist, and musical instrument modeling. They can be thought of as picking up on the "timbre" of the sound; or the stuff that isn't the notes. By themselves, they should be able to separate many songs into distinct regions.
Chroma Chroma is a 12-dimension feature set used to model the strength of the 12 notes in the Western chromatic scale, factoring out the octave that they're in (so a 440hz and an 880hz A are equivalient, for example). This is complementary to MFCC, since it models note pitches only, attempting to factor out information about timbre.
Spectral Centroid The spectral centroid is a single number which is the average frequency in a chunk of audio. This number is higher with higher musical notes, high frequency vibrations from percussive instruments, and consonant vocal sounds. Conversely, it is lower for bass instruments, lower notes, and vowel sounds.
Spectral Roloff The spectral roloff is the frequency below which 85% of the spectral power is contained. Songs with more bass will have a lower spectral roloff, while electronic music with lots of high frequency energy will have a higher spectral roloff.
Spectral Flux The spectral flux is the average difference in power between each frequency index between two adjacent chunks of audio. It is like a "frequency derivative." In highly percussive music, high values of this feature correlate strongly with drum hits, so if you use this feature by itself, you're likely to see a curve that "waves with the beat."
Zero Crossings This is a simple but surprisingly descriptive feature that counts the number of times the audio waveform goes up and down through the zero line. It will be higher for higher frequencies
Window Options
Features from chunks of the audio are summarized within a "sliding window" of audio. By default, the program takes the average value of features within each window, but it is also possible to report the variation of the features within the window.
Window Length This is the length in seconds of the window that is used to summarize the features taken in chunks of audio. Features are computed in 23 millisecond chunks, so the default value of 3.5 seconds corresponds to averaging roughly 150 windows. The longer the window is, the more sound is summarized by the window, which means the points in the animation are likely to be more distinct. Longer windows also tend to produce smoother animations, while smaller windows pick up on faster variations
Sphere Normalize For certain sections of audio that are quieter or very distinct from others, it may be the case that the animation spreads out with long tails. Sphere normalization makes the magnitude of each feature point in the high dimensional space to be the same before the projection. In plain English, it tries to distribute the points out more evenly.
Use Variation In addition to reporting the mean of all of the features in a window, it is also possible to tack on more numbers that report how volatile each feature is in the window. This is similar to the variance or standard deviation of features. Computation will take slightly longer, but you may get more distinct points in your animation.




Controls / Interaction

  • Hit the play/pause buttons to play and pause the audio once a song is loaded in, and click on the slider below the animation to jump through the song as it's playing

  • With a mouse, left click + drag to rotate the curve, center click + drag to translate the curve, and right click + drag to zoom in and out. If your mouse doesn't have center or right click, then hold CTRL + drag left mouse to translate and hold SHIFT + drag left mouse to zoom.

  • With a mobile device or a touch screen, drag to translate/rotate the curve and pinch to zoom. Double tap to toggle between translating and rotating.

  • If you click the "Make GIF" button, the program will automatically generate an animated GIF looping 360 degrees around the current song you've loaded (thanks to gif.js).

  • If you load a sound from Soundcloud (including the precomputed sounds), a twitter button will pop up below the time slider. This will allow you to share a link which stores the song link and parameter choices you found under the hashtag #loopditty with a clickable link that others can use to view what you found. So if you find interesting songs and parameter combinations, please share them!

  • Why the colors?? Colors indicate time. Cool colors correspond to chunks of audio towards the beginning of the song, while hot colors correspond to colors towards the end. With this scheme, if you see lots of different colors mixing together in a cluster, then that indicates visually that a song has repetitive sections that occur at different times.

About

This app is a combination of audio and 3D geometry that is used to visually inspect statistics of music synchronized with the music they represent. In plain English, it's a music visualizer with a curve waving to the music. Depending on the statistics you choose, the curve will respond to different aspects of the music (mouse over "Feature Choices" for more information). For instance, if you use the default parameters, you should see a curve which moves to distinct parts of space for different musical sections, such as verse, chorus, and bridge. If, on the other hand, you choose only spectral flux, with a window length of 0.2 seconds, using sphere normalization and variation, you will see a curve which moves around a circle to the beat. Play around with different combinations of feature parameters and see what you get for different types of music!

This app started off as an assignment that Chris Tralie created for an undergraduate class in applied topology as part of Duke Data Expeditions during the fall of 2014. It served as the motivation for research on cover song identification with timbral shape features and geometric models for musical audio, which represent a portion of the applications in Chris's Ph.D. research. This work was also highlighted in a Forbes blog article.