Archive for 2011

Looking Outwards – Kinect Edition

So I’ve been thinking a lot about what I want to do with the Kinect. I got one for christmas, and I still haven’t had time to do much with it. I’m a huge music person, and I’d like to create an interactive audio visualizer that takes input from body movement instead of perceivable audio qualities (volume, frequency waveforms, etc…). I think that using gestural input from a person dancing, conducting, or otherwise rocking out to music would provide a much more natural input, since it would accurately reflect the individual’s response to the audio. I can imagine pointing a Kinect at a club full of dancing people and using their movement to drive a wall-sized visualization. It’d be a beautifully human representation of the music.

I’ve been Googling to see if anyone is doing something like this already, and I haven’t been able to find anything really compelling. People have wired the Kinect through TUIO to drive fluid systems and particle emitters, but not for the specific purpose of representing a piece of music. I don’t find these very impressive, because they’re really dumbing down the rich input from the Kinect. They just treat the users’ hands as blobs, find their centers, and use those as multitouch points. It must be possible to do something more than that. But I haven’t tried yet, and I want everything to be real-time – so maybe not ;-)

Here are a few visual styles I’ve been thinking of trying to reproduce. The first is a bleeding long-exposure effect that was popularized by the iPod commercials a few years ago. Though it seems most people are doing this in After Effects, I think I can do it in OpenGL or maybe Processing:

This is possibly the coolest visualization I’ve seen in a while. However, it was done in 3D Studio Max with the Krakatoa plugin, and everything was painstakingly hand-scripted into a particle system. I love the way the light shoots through the particles (check out 0:16), though. I’d like to create something where the user’s hands are light sources… It’d be incredibly slick.

I’m not sure how to approach implementing something like this, and I’m still looking for existing platforms that can give me a leg-up. I have significant OpenGL experience and I’ve done fluid dynamics using Jos Stam’s Navier-Stokes equation solver, so I could fuse that to a custom renderer to get this done, but I’d like to focus on the art and input and let something else handle the graphics, so suggestions are welcome!

Looking Outwards – Generative Art

I’m a huge fan of Dave Bollinger’s work “Density” (http://www.davebollinger.com/works/density/). He does a mix of generative art and traditional art, and he blends computer programming with traditional mediums. He’s done some generative works that are in a wood block style, and I think they look pretty cool. Unfortunately, he doesn’t document his process very much.

There’s a service online called DNA11 (www.dna11.com) that produces generative art from DNA. You submit a small DNA sample, and they run a PCR of it, colorize it, and enlarge it onto a large canvas. I think it’s a really cool form of generative art because it’s completely personalized.

I think it’d be fun to use this assignment to create an art piece I can hang in my apartment (my walls are looking pretty bare right now…) so I’ve been focusing on generative art that creates static images. I found the work of Marius Watz pretty interesting because he uses code to produce large wall-sized artworks that are visually intriguing and have a lot of originality from piece to piece, while retaining a sense of unity among the set. You can browse the collection of final images here: http://systemc.unlekker.net/showall.php?id=SystemC_050114_150004_04.

Cool Computational Art

The Graffiti Analysis project by Evan Roth makes an effort to capture the motion of graffiti in an artistic fashion. I’m interested in using the Kinect to capture hand gestures representative of audio, and I think this is a really cool visualization of gestural input. The way that velocity information is presented as thin rays is visually appealing. I think it would be more interesting if the project incorporated color, though–since real graffiti communicates with the viewer using color as well as shape.

Cosmogramma Fieldlines is an interactive music visualization created in OpenFrameworks. It was created by Aaron Meyers for the release of an album by the band Flying Lotus. I really like the steampunk, ink and paper graphic design of the project, and I like the way the lines radiating from the object in the center converge around the “planets.” I think it’d be cool to change the interaction approach so that the user could “strum” or otherwise manipulate the radial lines instead of the planets, but it might be harder to do?

This project, called “Solar Rework”, is a really fantastic visualization of audio that uses colored blobs, bright colors and glassy “waves” to represent audio data. I think it’s cool because it visually conveys the idea that the sound is “washing over” the blobs in the scene. I really don’t have any complaints with this one, except that I wish there was source I could download and try out myself.

http://www.turbulence.org/Works/song/mono.html

The Shape of Song is a way of visualizing music that reveals repetition within a track. It’s an interesting way of profiling a song and revealing the underlying data, and the implementation uses arcs for some pretty cool looking shapes. Unfortunately, the visualization is static–when I ran it for the first time, I really expected the visualization to be generated as I listened to the song, and I was a little disappointed when it was already there.

Text Rain

I implemented the text rain exercise in Processing and used code from the Background Subtraction sample at Processing.org to do the underlying detection of objects in the scene.

Learning Processing – Schotter

In Processing.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
 
//Info: http://processingjs.org/reference
 
void setup() {
 
	size(404,730);
 
	int rows = 22;
 
	int cols = 12;
 
	int count = cols * rows;
 
	int rect_width = 384 / cols;
 
	int rect_height = rect_width;
 
 
 
smooth();
 
translate(10,10);
 
background(#ffffff);
 
noFill();
 
 
 
	for (int ii = 0; ii < count; ii ++){
 
		int origin_x = (ii%cols) * rect_width;
 
		int origin_y = floor(ii / cols) * rect_height;
 
		float randomness = ((float)ii / (float)count);
 
		float rand_rad = (random(2) - 1) * randomness * randomness;
 
		float rand_x = (random(8) - 4) * randomness * randomness;
 
		float rand_y = (random(8) - 4) * randomness * randomness; 
 
		translate(origin_x, origin_y);
 
		rotate(rand_rad);
 
		rect(rand_x, rand_y, rect_width, rect_height);
 
		rotate(-rand_rad);
 
		translate(-origin_x, -origin_y);
 
	}
 
}
 
void draw() {
 
}

As a Java applet:

As code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
 
size(404,730);
 
 
 
int rows = 22;
 
int cols = 12;
 
int count = cols * rows;
 
int rect_width = 384 / cols;
 
int rect_height = rect_width;
 
 
 
smooth();
 
translate(10,10);
 
background(#ffffff);
 
noFill();
 
 
 
for (int ii = 0; ii < count; ii ++){
 
  int origin_x = (ii%cols) * rect_width;
 
  int origin_y = floor(ii / cols) * rect_height;
 
 
 
  float randomness = ((float)ii / (float)count);
 
  float rand_rad = (random(2f) - 1f) * randomness * randomness;
 
  float rand_x = (random(8f) - 4f) * randomness * randomness;
 
  float rand_y = (random(8f) - 4f) * randomness * randomness; 
 
  translate(origin_x, origin_y);
 
  rotate(rand_rad);
 
  rect(rand_x, rand_y, rect_width, rect_height);
 
  rotate(-rand_rad);
 
  translate(-origin_x, -origin_y);
 
}

A video demonstrating it done in OpenFrameworks as well:

Kinect Hand-tracking Visualization

What if you could use hand gestures to control an audio visualization? Instead of relying on audio metrics like frequency and volume, you could base the visualization on the user’s interpretation of perceivable audio qualities. The end result would be a better reflection of the way that people feel about music.

To investigate this, I wrote an OpenFrameworks application that uses depth data from the Kinect to identify hands in a scene. The information about the users’ hands – position, velocity, heading, and size – is used to create an interactive visualization with long-exposure motion trails and particle effects.

There were a number of challenges in this project. I started with Processing, but it was too slow to extract hands and render the point sprite effects I wanted. I switched to OpenFrameworks and started using OpenNI to extract a skeleton from the Kinect depth image. OpenNI worked well and extracted a full skeleton with wrists that could be tracked, but it was difficult to test because the skeletal detection took nearly a minute every time the visualization was tested. It got frustrating pretty quickly, and I decided to do hand detection manually.

Detecting Hands in the Depth Image

I chose a relatively straightforward approach to finding hands in the depth image. I made three significant assumptions that made realtime detection possible:

  1. The users body intersects the bottom of the frame
  2. The user is the closest thing in the scene.
  3. The users hands are extended (at least slightly) in front of their body

Assumption 1 is important because it allows for automatic depth thresholding. By assuming that the user intersects the bottom of the frame, we can scan the bottom row of depth pixels to determine the depth of the users body. The hand detection ignores anything further away than the user.

Assumptions 2 and 3 are important for the next step in the process. The application looks for local minima in the depth image and identifies the points nearest the camera. It then uses a breadth-first search algorithm to repeatedly expand the blob to neighboring points and find the boundaries of hands. Each pixel is scored based on it’s depth and distance from the source. Pixels that are scored as part of one hand cannot be scored as part of another hand and this prevents near points in the same hand from generating multiple resulting blobs.

Interpreting Hands

Once pixels in the depth image have been identified as hands, a bounding box is created around each one. The bounding boxes are compared to those found in the previous frame and matched together, so that the user’s two hands are tracked separately.

Once each blob has been associated with the left or right hand, the algorithm determines the heading, velocity and acceleration of the hand. This information is averaged over multiple frames to eliminate noise.

Long-Exposure Motion Trails

The size and location of each hand are used to extend a motion trail from the user’s hand. The motion trail is stored in an array. Each point in the trail has an X and Y position, and a size. To render the motion trail, overlapping, alpha-blended point sprites are drawn along the entire length of the trail. A catmul-rom spline algorithm is used to interpolate between the points in the trail and create a smooth path. Though it might seem best to append a point to the motion trail every frame, this tends to cause noise. In the version below, a point is added to the trail every three frames. This increases the distance between the points in the trail and allows for more smoothing using catmul-rom interpolation.

Hand Centers

One of the early problems with the hand tracking code was the center of the blob bounding boxes were used as the input to the motion trails. When the user held up their forearm perpendicular to the camera, the entire length of their arm was recognized as a hand. To better determine where the center of the hand was, I wrote a midpoint finder based on iterative erosion of the blobs. This provided much more accurate hand centers for the motion trails.

Particle Effects

After the long-exposure motion trails were working properly, I decided that more engaging visuals were needed to create a compelling visualization. It seemed like particles would be a good solution because they could augment the feeling of motion created by the user’s gestures. Particles are created when the hand blobs are in motion, and more particles are created based on the hand velocity. The particles stream off the motion trail in the direction of motion, and curve slightly as they move away from the hand. They fade and disappear after a set number of frames.

Challenges and Obstacles

This is my first use of the open-source ofxKinect framework and OpenFrameworks. It was also my first attempt to do blob detection and blob midpoint finding, so I’m happy those worked out nicely. I investigated Processing and OpenNI but chose not to use them because of performance and debug time implications, respectively.

Live Demo

The video below shows the final visualization. It was generated in real-time from improv hand gestures I performed while listening to “Dare you to Move” by the Vitamin String Quartet.

HexDefense

Intense, arcade-style tower defense for Android

The Story

HexDefense started as a class project for a mobile prototyping lab I took while at Carnegie Mellon. The lab required that apps be written in Java on the Android platform, and I figured it’d be a good opportunity to try writing a game. I’m a big fan of the tower defense genre and I’ve been heavily influenced by games on the iPhone like Field Runners and GeoDefense Swarm. From the outset, I wanted the game to have arcade style graphics reminiscent of Geometry Wars. That way, I figured, I wouldn’t have to find an artist to create the sprites, and I could focus on explosive OpenGL particle effects and blend-based bloom.

During the fall semester, I collaborated with Paul Caravelli and Tony Zhang on the first iteration of the game. I had the strongest graphics and animation background, so I focused on the gameplay and wrote all of the OpenGL code behind the game. I also created most of the game model, implementing the towers and creeps and creating actions with game logic for tower targeting, attacks, projectile motion, explosions, implosions and other effects. Paul contributed path finding code for the creeps based on breadth-first-search and created interfaces for implementing in-game actions based on the command pattern. He also contributed the original implementation of the grid model and worked on abstract base classes in the game model. Tony created the app’s settings screen and linked together activities for the different screens of the application.

At the end of the fall semester, the game was functional but unrefined. There were no sounds, no levels, and I’d only created one type of enemy. After the class ended, I talked with Paul and decided to finish it over my Christmas break. Paul was too busy to continue working on the app, so I continued development independently. I worked full-time for four weeks to deliver the level of polish I was accustomed to on the iPhone. I refined the graphics, tested the app across a variety of phones and added fifteen levels. I also added 3D directional sound, boss creeps and wrapped everything in a completely new look and feel. People say that the last 10% is the 90% of the work, and I think that’s particularly true on Android – there are minor differences across devices that make writing a solid game a lot more work than I expected.

The game was released at the end of January and has been well received so far. I created a lot of promotional art and setup a website with gameplay footage and press resources, and the game has garnered quite a bit of attention. It’s been featured on the front page of the Android marketplace and has a 4 1/2 stars. It’s rising in the “Paid Apps” rankings and is currently the #16th most popular game on the Android platform!

Lessons Learned:

I’ve learned a lot about the Android platform developing HexDefense. A couple of tips and takeaways:

  1. Let the OpenGL view run in CONTINUOUS mode. Nothing else (timers, threads that trigger redraws) will give performance close to this.
  2. Write all of the game logic so that it can advance the model by an arbitrary number of milliseconds. Because multitasking can cause hiccups in the game framerate, this is _really_ important for a smooth game.
  3. OpenGL textures are not numbered sequentially on all devices. The original DROID will choose random integer values each time you call glGenTexture.
  4. There are numerous drawbacks to using the Java OpenGL API. If your game needs to modify vertex or texcoord buffers every frame, you’ll have to accept a performance hit. The deformation of the grid in HexDefense is achieved by modifying the texcoords on a sub-segmented plane, and passing the data through a ByteBuffer to OpenGL is not cool.
  5. The iPhone’s OpenGL implementation is at least 2.5x faster, even on devices with half the processor speed. An iOS port of HexDefense is in progress, and the game runs twice as fast on an original iPod Touch as it does on a Nexus One. There are a lot of reasons for this, but it seems that drawing large textured quads has greater speed implications on Android devices.

Drill Down WebView Navigation

The next version of NetSketch will include a community browser, allowing you to view uploaded drawings, watch replays, and leave comments without leaving the app. When I started working on the community interface, I looked to other apps for inspiration. Almost every app I’ve used on the iPhone use a sliding navigation scheme, giving you the feeling that you’re drilling down into content as you use the application. This interface is intuitive in a lot of contexts, and dates back to the original iPod. The Facebook app allows you to browse other people’s facebook pages and uses a drill down navigation bar. This works well for the social-network space because you can drill down to look at information and then return to the first page quickly.

I decided to use a UINavigationBar and implement a similar drill-down interface for NetSketch. However, I didn’t want to create custom controllers for each page in the community. I wanted to be able to improve the community without updating the app, and didn’t want to write a communication layer to download and parse images and custom XML from the server.

Using a UIWebView seemed like the obvious choice. It could make retrieving content more efficient, and pages could be changed on the fly. With WebKit’s support for custom CSS, I could make the interface look realistic and comprable to a pile of custom-written views.

I quickly realized that it wasn’t all that easy to implement “drill down” behavior with a UIWebView. Early on, I ruled out the possibility of creating a mock navigation bar in HTML. Since Safari on the iPhone doesn’t support “position:static” or “position:fixed” CSS tags, there was no good way to make the bar sit at the top of the screen. I decided that a native UINavigationBar would be more practical and provide a better user experience. However, UINavigationController was built to use separate controllers for each layer, and doesn’t worry about freeing up memory when the stack of controllers gets big. I thought it was important that a maximum of eight UIWebViews were in memory at once, since Mobile Safari obeys that limitation and because pages could potentially be very large.

I tried several solutions, and finally created a custom DrillDownWebController class with a manually managed UINavigationBar to handle the interface. The class maintains a “stack” of DrillDownPages, with each page representing a single layer in the drill-down hierarchy. It can be a root level controller, or it can be loaded into an existing UINavigationController. When it appears, it silently swaps its parent’s navigation bar with it’s own.

The DrillDownPage is a wrapper for a UIWebView that acts as its delegate and provides higher-level access to important properties of the page, such as it’s title. When the user clicks a link in a web view, a new DrillDownPage object is created and it begins loading the requested page in an invisible UIWebView. The controller displays an activity indicator in the top right corner of the navigation bar, and slides in the new page when it finishes loading. All the other pages in the page “stack” are notified that their position in the stack has changed.

The notification step is important, because it allows the Page objects to