Posts tagged “music

Looking Outwards – Kinect Edition

So I’ve been thinking a lot about what I want to do with the Kinect. I got one for christmas, and I still haven’t had time to do much with it. I’m a huge music person, and I’d like to create an interactive audio visualizer that takes input from body movement instead of perceivable audio qualities (volume, frequency waveforms, etc…). I think that using gestural input from a person dancing, conducting, or otherwise rocking out to music would provide a much more natural input, since it would accurately reflect the individual’s response to the audio. I can imagine pointing a Kinect at a club full of dancing people and using their movement to drive a wall-sized visualization. It’d be a beautifully human representation of the music.

I’ve been Googling to see if anyone is doing something like this already, and I haven’t been able to find anything really compelling. People have wired the Kinect through TUIO to drive fluid systems and particle emitters, but not for the specific purpose of representing a piece of music. I don’t find these very impressive, because they’re really dumbing down the rich input from the Kinect. They just treat the users’ hands as blobs, find their centers, and use those as multitouch points. It must be possible to do something more than that. But I haven’t tried yet, and I want everything to be real-time – so maybe not ;-)

Here are a few visual styles I’ve been thinking of trying to reproduce. The first is a bleeding long-exposure effect that was popularized by the iPod commercials a few years ago. Though it seems most people are doing this in After Effects, I think I can do it in OpenGL or maybe Processing:

This is possibly the coolest visualization I’ve seen in a while. However, it was done in 3D Studio Max with the Krakatoa plugin, and everything was painstakingly hand-scripted into a particle system. I love the way the light shoots through the particles (check out 0:16), though. I’d like to create something where the user’s hands are light sources… It’d be incredibly slick.

I’m not sure how to approach implementing something like this, and I’m still looking for existing platforms that can give me a leg-up. I have significant OpenGL experience and I’ve done fluid dynamics using Jos Stam’s Navier-Stokes equation solver, so I could fuse that to a custom renderer to get this done, but I’d like to focus on the art and input and let something else handle the graphics, so suggestions are welcome!

Cool Computational Art

The Graffiti Analysis project by Evan Roth makes an effort to capture the motion of graffiti in an artistic fashion. I’m interested in using the Kinect to capture hand gestures representative of audio, and I think this is a really cool visualization of gestural input. The way that velocity information is presented as thin rays is visually appealing. I think it would be more interesting if the project incorporated color, though–since real graffiti communicates with the viewer using color as well as shape.

Cosmogramma Fieldlines is an interactive music visualization created in OpenFrameworks. It was created by Aaron Meyers for the release of an album by the band Flying Lotus. I really like the steampunk, ink and paper graphic design of the project, and I like the way the lines radiating from the object in the center converge around the “planets.” I think it’d be cool to change the interaction approach so that the user could “strum” or otherwise manipulate the radial lines instead of the planets, but it might be harder to do?

This project, called “Solar Rework”, is a really fantastic visualization of audio that uses colored blobs, bright colors and glassy “waves” to represent audio data. I think it’s cool because it visually conveys the idea that the sound is “washing over” the blobs in the scene. I really don’t have any complaints with this one, except that I wish there was source I could download and try out myself.

The Shape of Song is a way of visualizing music that reveals repetition within a track. It’s an interesting way of profiling a song and revealing the underlying data, and the implementation uses arcs for some pretty cool looking shapes. Unfortunately, the visualization is static–when I ran it for the first time, I really expected the visualization to be generated as I listened to the song, and I was a little disappointed when it was already there.

Kinect Hand-tracking Visualization

What if you could use hand gestures to control an audio visualization? Instead of relying on audio metrics like frequency and volume, you could base the visualization on the user’s interpretation of perceivable audio qualities. The end result would be a better reflection of the way that people feel about music.

To investigate this, I wrote an OpenFrameworks application that uses depth data from the Kinect to identify hands in a scene. The information about the users’ hands – position, velocity, heading, and size – is used to create an interactive visualization with long-exposure motion trails and particle effects.

There were a number of challenges in this project. I started with Processing, but it was too slow to extract hands and render the point sprite effects I wanted. I switched to OpenFrameworks and started using OpenNI to extract a skeleton from the Kinect depth image. OpenNI worked well and extracted a full skeleton with wrists that could be tracked, but it was difficult to test because the skeletal detection took nearly a minute every time the visualization was tested. It got frustrating pretty quickly, and I decided to do hand detection manually.

Detecting Hands in the Depth Image

I chose a relatively straightforward approach to finding hands in the depth image. I made three significant assumptions that made realtime detection possible:

  1. The users body intersects the bottom of the frame
  2. The user is the closest thing in the scene.
  3. The users hands are extended (at least slightly) in front of their body

Assumption 1 is important because it allows for automatic depth thresholding. By assuming that the user intersects the bottom of the frame, we can scan the bottom row of depth pixels to determine the depth of the users body. The hand detection ignores anything further away than the user.

Assumptions 2 and 3 are important for the next step in the process. The application looks for local minima in the depth image and identifies the points nearest the camera. It then uses a breadth-first search algorithm to repeatedly expand the blob to neighboring points and find the boundaries of hands. Each pixel is scored based on it’s depth and distance from the source. Pixels that are scored as part of one hand cannot be scored as part of another hand and this prevents near points in the same hand from generating multiple resulting blobs.

Interpreting Hands

Once pixels in the depth image have been identified as hands, a bounding box is created around each one. The bounding boxes are compared to those found in the previous frame and matched together, so that the user’s two hands are tracked separately.

Once each blob has been associated with the left or right hand, the algorithm determines the heading, velocity and acceleration of the hand. This information is averaged over multiple frames to eliminate noise.

Long-Exposure Motion Trails

The size and location of each hand are used to extend a motion trail from the user’s hand. The motion trail is stored in an array. Each point in the trail has an X and Y position, and a size. To render the motion trail, overlapping, alpha-blended point sprites are drawn along the entire length of the trail. A catmul-rom spline algorithm is used to interpolate between the points in the trail and create a smooth path. Though it might seem best to append a point to the motion trail every frame, this tends to cause noise. In the version below, a point is added to the trail every three frames. This increases the distance between the points in the trail and allows for more smoothing using catmul-rom interpolation.

Hand Centers

One of the early problems with the hand tracking code was the center of the blob bounding boxes were used as the input to the motion trails. When the user held up their forearm perpendicular to the camera, the entire length of their arm was recognized as a hand. To better determine where the center of the hand was, I wrote a midpoint finder based on iterative erosion of the blobs. This provided much more accurate hand centers for the motion trails.

Particle Effects

After the long-exposure motion trails were working properly, I decided that more engaging visuals were needed to create a compelling visualization. It seemed like particles would be a good solution because they could augment the feeling of motion created by the user’s gestures. Particles are created when the hand blobs are in motion, and more particles are created based on the hand velocity. The particles stream off the motion trail in the direction of motion, and curve slightly as they move away from the hand. They fade and disappear after a set number of frames.

Challenges and Obstacles

This is my first use of the open-source ofxKinect framework and OpenFrameworks. It was also my first attempt to do blob detection and blob midpoint finding, so I’m happy those worked out nicely. I investigated Processing and OpenNI but chose not to use them because of performance and debug time implications, respectively.

Live Demo

The video below shows the final visualization. It was generated in real-time from improv hand gestures I performed while listening to “Dare you to Move” by the Vitamin String Quartet.