Posts by bengotow

Kinect Hand-tracking Visualization

What if you could use hand gestures to control an audio visualization? Instead of relying on audio metrics like frequency and volume, you could base the visualization on the user’s interpretation of perceivable audio qualities. The end result would be a better reflection of the way that people feel about music.

To investigate this, I wrote an OpenFrameworks application that uses depth data from the Kinect to identify hands in a scene. The information about the users’ hands – position, velocity, heading, and size – is used to create an interactive visualization with long-exposure motion trails and particle effects.

There were a number of challenges in this project. I started with Processing, but it was too slow to extract hands and render the point sprite effects I wanted. I switched to OpenFrameworks and started using OpenNI to extract a skeleton from the Kinect depth image. OpenNI worked well and extracted a full skeleton with wrists that could be tracked, but it was difficult to test because the skeletal detection took nearly a minute every time the visualization was tested. It got frustrating pretty quickly, and I decided to do hand detection manually.

Detecting Hands in the Depth Image

I chose a relatively straightforward approach to finding hands in the depth image. I made three significant assumptions that made realtime detection possible:

  1. The users body intersects the bottom of the frame
  2. The user is the closest thing in the scene.
  3. The users hands are extended (at least slightly) in front of their body

Assumption 1 is important because it allows for automatic depth thresholding. By assuming that the user intersects the bottom of the frame, we can scan the bottom row of depth pixels to determine the depth of the users body. The hand detection ignores anything further away than the user.

Assumptions 2 and 3 are important for the next step in the process. The application looks for local minima in the depth image and identifies the points nearest the camera. It then uses a breadth-first search algorithm to repeatedly expand the blob to neighboring points and find the boundaries of hands. Each pixel is scored based on it’s depth and distance from the source. Pixels that are scored as part of one hand cannot be scored as part of another hand and this prevents near points in the same hand from generating multiple resulting blobs.

Interpreting Hands

Once pixels in the depth image have been identified as hands, a bounding box is created around each one. The bounding boxes are compared to those found in the previous frame and matched together, so that the user’s two hands are tracked separately.

Once each blob has been associated with the left or right hand, the algorithm determines the heading, velocity and acceleration of the hand. This information is averaged over multiple frames to eliminate noise.

Long-Exposure Motion Trails

The size and location of each hand are used to extend a motion trail from the user’s hand. The motion trail is stored in an array. Each point in the trail has an X and Y position, and a size. To render the motion trail, overlapping, alpha-blended point sprites are drawn along the entire length of the trail. A catmul-rom spline algorithm is used to interpolate between the points in the trail and create a smooth path. Though it might seem best to append a point to the motion trail every frame, this tends to cause noise. In the version below, a point is added to the trail every three frames. This increases the distance between the points in the trail and allows for more smoothing using catmul-rom interpolation.

Hand Centers

One of the early problems with the hand tracking code was the center of the blob bounding boxes were used as the input to the motion trails. When the user held up their forearm perpendicular to the camera, the entire length of their arm was recognized as a hand. To better determine where the center of the hand was, I wrote a midpoint finder based on iterative erosion of the blobs. This provided much more accurate hand centers for the motion trails.

Particle Effects

After the long-exposure motion trails were working properly, I decided that more engaging visuals were needed to create a compelling visualization. It seemed like particles would be a good solution because they could augment the feeling of motion created by the user’s gestures. Particles are created when the hand blobs are in motion, and more particles are created based on the hand velocity. The particles stream off the motion trail in the direction of motion, and curve slightly as they move away from the hand. They fade and disappear after a set number of frames.

Challenges and Obstacles

This is my first use of the open-source ofxKinect framework and OpenFrameworks. It was also my first attempt to do blob detection and blob midpoint finding, so I’m happy those worked out nicely. I investigated Processing and OpenNI but chose not to use them because of performance and debug time implications, respectively.

Live Demo

The video below shows the final visualization. It was generated in real-time from improv hand gestures I performed while listening to “Dare you to Move” by the Vitamin String Quartet.


Intense, arcade-style tower defense for Android

The Story

HexDefense started as a class project for a mobile prototyping lab I took while at Carnegie Mellon. The lab required that apps be written in Java on the Android platform, and I figured it’d be a good opportunity to try writing a game. I’m a big fan of the tower defense genre and I’ve been heavily influenced by games on the iPhone like Field Runners and GeoDefense Swarm. From the outset, I wanted the game to have arcade style graphics reminiscent of Geometry Wars. That way, I figured, I wouldn’t have to find an artist to create the sprites, and I could focus on explosive OpenGL particle effects and blend-based bloom.

During the fall semester, I collaborated with Paul Caravelli and Tony Zhang on the first iteration of the game. I had the strongest graphics and animation background, so I focused on the gameplay and wrote all of the OpenGL code behind the game. I also created most of the game model, implementing the towers and creeps and creating actions with game logic for tower targeting, attacks, projectile motion, explosions, implosions and other effects. Paul contributed path finding code for the creeps based on breadth-first-search and created interfaces for implementing in-game actions based on the command pattern. He also contributed the original implementation of the grid model and worked on abstract base classes in the game model. Tony created the app’s settings screen and linked together activities for the different screens of the application.

At the end of the fall semester, the game was functional but unrefined. There were no sounds, no levels, and I’d only created one type of enemy. After the class ended, I talked with Paul and decided to finish it over my Christmas break. Paul was too busy to continue working on the app, so I continued development independently. I worked full-time for four weeks to deliver the level of polish I was accustomed to on the iPhone. I refined the graphics, tested the app across a variety of phones and added fifteen levels. I also added 3D directional sound, boss creeps and wrapped everything in a completely new look and feel. People say that the last 10% is the 90% of the work, and I think that’s particularly true on Android – there are minor differences across devices that make writing a solid game a lot more work than I expected.

The game was released at the end of January and has been well received so far. I created a lot of promotional art and setup a website with gameplay footage and press resources, and the game has garnered quite a bit of attention. It’s been featured on the front page of the Android marketplace and has a 4 1/2 stars. It’s rising in the “Paid Apps” rankings and is currently the #16th most popular game on the Android platform!

Lessons Learned:

I’ve learned a lot about the Android platform developing HexDefense. A couple of tips and takeaways:

  1. Let the OpenGL view run in CONTINUOUS mode. Nothing else (timers, threads that trigger redraws) will give performance close to this.
  2. Write all of the game logic so that it can advance the model by an arbitrary number of milliseconds. Because multitasking can cause hiccups in the game framerate, this is _really_ important for a smooth game.
  3. OpenGL textures are not numbered sequentially on all devices. The original DROID will choose random integer values each time you call glGenTexture.
  4. There are numerous drawbacks to using the Java OpenGL API. If your game needs to modify vertex or texcoord buffers every frame, you’ll have to accept a performance hit. The deformation of the grid in HexDefense is achieved by modifying the texcoords on a sub-segmented plane, and passing the data through a ByteBuffer to OpenGL is not cool.
  5. The iPhone’s OpenGL implementation is at least 2.5x faster, even on devices with half the processor speed. An iOS port of HexDefense is in progress, and the game runs twice as fast on an original iPod Touch as it does on a Nexus One. There are a lot of reasons for this, but it seems that drawing large textured quads has greater speed implications on Android devices.

Drill Down WebView Navigation

The next version of NetSketch will include a community browser, allowing you to view uploaded drawings, watch replays, and leave comments without leaving the app. When I started working on the community interface, I looked to other apps for inspiration. Almost every app I’ve used on the iPhone use a sliding navigation scheme, giving you the feeling that you’re drilling down into content as you use the application. This interface is intuitive in a lot of contexts, and dates back to the original iPod. The Facebook app allows you to browse other people’s facebook pages and uses a drill down navigation bar. This works well for the social-network space because you can drill down to look at information and then return to the first page quickly.

I decided to use a UINavigationBar and implement a similar drill-down interface for NetSketch. However, I didn’t want to create custom controllers for each page in the community. I wanted to be able to improve the community without updating the app, and didn’t want to write a communication layer to download and parse images and custom XML from the server.

Using a UIWebView seemed like the obvious choice. It could make retrieving content more efficient, and pages could be changed on the fly. With WebKit’s support for custom CSS, I could make the interface look realistic and comprable to a pile of custom-written views.

I quickly realized that it wasn’t all that easy to implement “drill down” behavior with a UIWebView. Early on, I ruled out the possibility of creating a mock navigation bar in HTML. Since Safari on the iPhone doesn’t support “position:static” or “position:fixed” CSS tags, there was no good way to make the bar sit at the top of the screen. I decided that a native UINavigationBar would be more practical and provide a better user experience. However, UINavigationController was built to use separate controllers for each layer, and doesn’t worry about freeing up memory when the stack of controllers gets big. I thought it was important that a maximum of eight UIWebViews were in memory at once, since Mobile Safari obeys that limitation and because pages could potentially be very large.

I tried several solutions, and finally created a custom DrillDownWebController class with a manually managed UINavigationBar to handle the interface. The class maintains a “stack” of DrillDownPages, with each page representing a single layer in the drill-down hierarchy. It can be a root level controller, or it can be loaded into an existing UINavigationController. When it appears, it silently swaps its parent’s navigation bar with it’s own.

The DrillDownPage is a wrapper for a UIWebView that acts as its delegate and provides higher-level access to important properties of the page, such as it’s title. When the user clicks a link in a web view, a new DrillDownPage object is created and it begins loading the requested page in an invisible UIWebView. The controller displays an activity indicator in the top right corner of the navigation bar, and slides in the new page when it finishes loading. All the other pages in the page “stack” are notified that their position in the stack has changed.

The notification step is important, because it allows the Page objects to

Layers for iPad

I started working on Layers for iPad the morning apple released the iPad SDK. I’d been looking forward to the release of the iPad for months, and it made sense to make a dedicated version of the app. I’d always had a difficult time drawing on the iPhone’s small 3″ screen. The iPad seemed like a perfect tool for professional artists–a digital sketch book you could carry anywhere.

The iPad’s larger form factor also raised new interaction questions. It was too big to shake, so the “wrist flick”-triggered menus in the original Layers app were out. It had more room for contextual information-toolbars wouldn’t significantly limit the size of the canvas viewport. The iPad also created new opportunities for community. It’s large screen was so good for showing off art that it seemed natural to allow people to post paintings and browse and comment on others work within the app. From the earliest phase of design, the focus was on building a community around the art people created on the iPad.

I spent about three weeks designing an interface for the iPad app, prototyping and fleshing out different screens of the app in Fireworks. Since I knew I would be doing the entire app myself, I didn’t bother creating storyboards using the individual interface mockups.

I like to write code during the design phase when I work on personal projects: a few hours of design, a few hours of code, repeat. It helps to step away from the designs every few hours. I started moving code from the iPhone version of Layers over to the iPad on day one, and a couple big issues became obvious in the first few weeks:

1. The display / memory ratio on the iPad makes it hard to keep display-quality images in memory. The iPhone version of Layers kept six 512×512 OpenGL textures in memory, but the iPad version isn’t able to keep six 1024×1024 images in memory. Fail!

2. Saving the image for each layer took more time (roughly 4x more, since there were 4x as many pixels) and caused the app to be terminated early when sent to the background.

3. Some procedures in Layers, like undoing a “Shift Layer” operation, took too much memory when the images were scaled up.

Unfortunately, I didn’t have early access to iPad hardware, so I didn’t know that #1 and #3 would be an issue until the app was actually live on the store. I got a ton of emails the week the iPad went on sale, and I skipped an entire week of class to fix the problems. The solution was a much smarter method of memory management that allows layers and parts of the undo stack to be paged to disk when the app receives memory warnings. It brought some major speed improvements to the iPad version, and I’ve since been rolled it back into the iPhone version.

Layers for iPad consists of 22,000 lines of Objective-C code, about 10,000 of which are shared with the iPhone version. The community component of the app is built in PHP and mySQL, and uses HTML5 and advanced Safari CSS 3 markup to create a convincing and “native” experience within the app. My experience building the community website was very positive. CSS 3 support on the iPad is great, I think web views are the best way to deliver native-looking and richly customized interfaces. You can even specify your own fonts. It’s beautifully fast. Had I chosen to deliver content via XML and render it all in custom UIViews, there’s no way I could have completed the app in three months.

In the first six months after its release, Layers for iPad was downloaded about 10,500 times. The Layers Gallery, the community built around content generated in the app, hosts thousands of paintings. The app was featured on the front page of the App Store during the week of April 28th, and was reviewed by TUAW and MacWorld twice! It was featured in Incase’s 2010 advertising campaign and billed as the favorite painting app of Ron Radziner. The Layers for iPad website has also received recognition for its clean, refined design on OneExtraPixel”, Designer Daily and Web Design Tuts+

After Layers was released for iPad, I worked with MEDLMobile in San Francisco to develop the app “Drawing Step by Step with Walter Foster” using the Layers painting engine to emulate realistic pens and pencils. During it’s first week on the store, that app was ranked #1 in Productivity.

The Best WordPress Site Ever?

So I accidentally clicked an ad this afternoon and stumbled across, an online community for eco-friendly folks. I hadn’t even scrolled half way down their home page when I found myself thinking: “What was this built in?” Ecoki is quite possibly the best designed wordpress site I’ve ever seen. I had to look at the page source to figure it out.

It looks like it’s a completely custom template. Must have cost a fortune… Great look though!

PNG compression in Android… (You have got to be kidding)

Over the last few weeks, I’ve been learning the Android SDK in an effort to bring Layers to Android devices. It’s going pretty well, but every once and a while I run into a truly WTF moment.

Tonight I was importing some images from the iPhone version of Layers when I noticed that Android seems to visibly reduce the quality of PNG files at compile time. In images with fine gradients, smooth color transitions, or very light shadows you tend to get banding. It almost looks like the image were being converted to a GIF file.

I figured it’d be easy to fix. Go into Eclipse, right click on everything, look in menus… repeat… profit! Unfortunately, it seems there’s no way to turn off compression for specific file or choose a non-lossy compression method. However, I found this gem of a solution in the Android Widget Design Guidelines:

“To reduce banding when exporting a widget, apply the following Photoshop Add Noise setting to your graphic.”

Um… what now?

It turns out, you can get around the compression algorithm by adding a small amount of pixel noise to your images. It’s mostly invisible, and it prevents the compression from producing obvious bands.

It’s an instant fix, but I almost laughed out loud. Seriously? This is the documented solution? *sigh*.

iPhone Development Tutorial

I gave a presentation within the Engineering school on Friday that gave a brief look at the iPhone platform and Objective-C. The end of the presentation was a quick tutorial in Interface Builder and XCode. You can download the presentation and the tutorial project here:

iPhone Development Tutorial

If you have any questions, feel free to email me at [email protected] or post a comment. Also, be sure to check out