Dire straits are ahead for 3D sensing. I commented about it back in November – that I hoped Apple wouldn’t kill open source 3D sensing library OpenNI from Primesense after the buyout.
If you visit http://www.openni.org you’ll notice that the entire website will shut down on April 23rd. Also unfortunate is that you’ll no longer be able to download the OpenNI and NiTE package with drivers.
Worse, they are seem to be liquidating their stock of Carmine sensors. Emptying their warehouse means they probably will never make another one. Also, Asus seems to be no longer making 3D sensors. I can only guess that Microsoft will stop making the old-style Kinect soon when the new one for Windows dev comes out.
What does this mean?
Well…no more cross platform 3D sensor hardware and no more software to run it. Creative makes the Senz 3D which is powered by the Intel SDK and MS will be making the new Kinect for Windows. Both are Windows only.
The cross platform market seems dead. Apple hasn’t disclosed yet what they will do with Primesense, while Google has announced Project Tango: a device with a 3D sensor built in. Everyone seems to be going their separate ways with this tech.
Probably the best way forward is to get a used XBOX Kinect, download OpenNI/NiTE to a safe place before April 23rd, and try to get things installed with Freenect if you use Mac or Linux.
Luckily, I received a nice contribution to my Node.js OpenNI wrapper. Gordon Turner added some great documentation around making my Node plugin NuiMotion work on Ubuntu and OSX with the Kinect and Freenect. This gave me the kick in the pants to update the project to have a better build process and test/fix integration with the new OpenNI/NiTE binaries on OSX, Windows, and Ubuntu. I’ve also added an API to get raw depth or RGB data.
With the right pre-install steps in place, all you’d have to do from any platform is a simple “npm install nuimotion”.
So, even though the crossplatform 3D sensing dream seems dead for now, we can stock up for the apocalypse and have fun until the next generation springs up. You can find NuiMotion at https://github.com/bengfarrell/nuimotion
Well that’s a little bit of an awkward title – its not ONLY my first Node.js plugin I’ve released on NPM but it ALSO does 3D motion sensing! So brand new on both counts – I’m no Node.js plugin/addon veteran by any means!
You can grab my NuiMotion project version 0.1 on NPM and Github. You can read why I did it, and about how I’m on a crusade for letting your interfaces move your body on the project page.
That said, I learned a lot of stuff. I think the project serves as a shining example of how one guy accomplished some rather difficult and not so ordinary things you’d need to do with Node.js. I won’t claim it’s necessarily the right way – just one way. I was a little scared of C++ before all of this, but I jumped in because I had a need that I wanted to fill, and C++ was the only way to get it done.
The hugest hurdle was breaking down the architecture to something that wouldn’t block the entire Node.js main process. In all of the OpenNI examples, they would run a big old while loop that will grab frames of video/depth and pull out the features we need like gestures and skeleton data.
This is SO not cool for Node.js, so I needed to delve into how to achieve threading in the V8 engine with “lib_uv”. I still don’t understand everything about the final lib_uv code I used (why some things are declared as they are), but I successfully broke it out into a new thread that runs as fast as your machine will let it. We reach in and grab our joints using a custom defined interval to poll at, and we event out when gestures and other events are encountered.
Of course, all of this NEEDS to be threadsafe. If you access the wrong thing inside a thread, you crash your entire process.
You can checkout the main logic of all of this, completely with C++/JS communication and threading here:
I didn’t do this alone, either. I asked a couple questions to the extremely awesome Node.js Google Groups. One was around the threads question, and the other was around C++ compiling. To demonstrate how much of a noob I was, my compiling question was that I didn’t realize you had to include ALL your *.cpp files in your sources target. I thought since the main.cpp references other things, they would be automatically included. NOPE! Live and learn.
Anyway – I’m of the opinion that this project probably represents some of the most difficult things you could ever need to know how to do in a Node.js addon (without getting into domain specific C++ code which could be infinitely complex for sure). So feel free to have a gander and learn!
As anybody who’s stopped by my blog over the last few months knows, I’ve been experimenting a lot with depth cameras like the Kinect and making it work as a plugin in Node.js. Happy to say that I’m moved the disorganized mess of experiments that I had in Github into what I hope would be a cohesive plugin.
I’d still like to test a bunch before I call it a release, but I think I have something going here to work a Kinect or Kinect like device through Node with the help of an open source library called OpenNI and not-so-open-source middleware called NiTE.
Unfortunately, though, one of the main things we associate with the Kinect isn’t really included in the middleware I’m using, and you’re left to your own devices. I’m talking about the gesture. Whether you’re swiping your hand, waving, or any number of things we associate as “natural” interaction mechanisms to trigger something in our interactive experience to happen.
I actually didn’t know where to start with this. Do people REALLY do brute force tracking of your appendage at every stage and consider every possible outcome of where your appendage could be at any given moment? From what I’d seen, the answer is yes. I posed my question to a private Kinect G+ Community I’m a part of. My question was basically…”How do you even start going about programming a gesture?”.
Because this small knit group of people rock – I received C# source code almost immediately for a swipe gesture from a Turkish developer named Furkan Üzümcü (@furkanzmc). I set out to convert it to C++ for my own purposes which I’ll share in a bit. It really opened up my eyes to the attention to detail about what you are doing with your body to register a gesture.
Here’s the anatomy of a swipe right gesture (as performed with your left hand, swiping left to right):
First, cancel any swipe right in progress if:
a) Left hand is above head – it doesn’t seem natural of a person to try to perform a swipe like this
b) Left hand is below left elbow – it seems more natural to perform a swipe when the upper arm is relaxed and the forearm is raised, with the hand in this position above the elbow
c) Left hand is above hip – this I think is more subjective, but ideally – you are purposely creating a gesture if you’ve raised your hand above your mid-section
Next, if the left hand’s x position is to the left of the left shoulder, then we are on the left side of our body, and ready to start a swipe – so flag this, that we’ve started a gesture. On the other hand – if the hand isn’t far enough to the left, measured by horizontal distance between the left hand and shoulder being less than the horizontal distance between the torso and the shoulder, then cancel the flag and indicate that we have NOT started the gesture.
Next, if the gesture is started and the timestamp hasn’t been recorded then mark the start time of our gesture.
Finally, if the horizontal position of the left hand is greater than the horizontal position of the torso AND the time it took is between the .1 seconds and 1 second, then we have a gesture!
So yeah, I’d call that brute force. Not only are you defining an event flow based on MULTIPLE joints (your hand, elbow, shoulder, head, and torso), but you are approximating allowable “good distance” for things based on the distance between things like your shoulder and torso.
These are important things to realize. This arm motion that sounds so easy, just became a whole body affair. And global/world coordinates really aren’t that great to use here since people can be closer to or farther away from the camera and people also come in all shapes and sizes. So, we think of all of our distances as relative to the distance from one body part to the other.
When I was programming this, I started sitting there, and really watching my body motion. I’d take my arm and start swinging it around…just thinking about how it moved. I felt like Otto the Bus driver from the Simpsons.
So, for a swipe left, you can imagine how that works. Just reverse the motion of the left swipe.
Up and down was a little harder. With the horizontal swipes, we can imagine a user as needing to vertically center the gesture just above their torso. But what of vertical swipes? I could easily swipe up at the left, right, or dead center of my body. Any of these are valid in my opinion, and no starting horizontal position of the swipe invalidates the gesture.
So, I took a page out of the relative distance between joints game. I said: “OK – the distance between your left and right hip is the maximum amount of horizontal variance you’re allowed to have in your vertical swipe”. So if your hand starts below your hip, and your hand is below your elbow, and continues up to where your hand is above your elbow in the right amount of time and with less than the maximum horizontal variance, then you have a swipe up!
Same with swipe down, but reverse of course.
Interesting thing when you listen for both events, though – if you want to do either a swipe up or swipe down, you make a conscious effort to put your hand into position. To get it into this position, it seems like a lot of the time, you are causing an accidental swipe up! I haven’t resolved this issue yet, but it’s an interesting one.
Lots of things to consider here. I also did one on my own – a wave gesture. As in a greeting – “hello”, “goodbye” – you know, a hand wave. Here, I simply detected if the hand was above the hip, below the head, and above the elbow. That’s my starting position. Then if the hand’s horizontal position goes left then right in a cycle 6 times where each motion takes less than .2 seconds, we have a wave.
Overall, you’re considering a lot of different things when designing a gesture. How “natural” it is for your users becomes how intuitive you make it. Have you considered how different people might interpret a swipe? If a user thinks perhaps a swipe takes place above their head, will they become frustrated if you don’t consider this fact? If a user waves super slow, and each side to side motion takes .3 seconds instead of my .2 seconds as designed, is THAT OK? Visual feedback can help greatly, but I think that considering all of the edge cases can greatly increase the natural feel and intuitiveness of any gestures you design even before thinking about visual feedback.
As I said, I’m funneling all this effort into my Node.js plugin. It’s probably in the alpha stage right now, so I’ll just link you to my gesture scripts over on Github:
I think as my add-on gets more solid, I’ll definitely be cleaning these up as well…organizing code better, removing redundancy, etc. This should just give you a good idea of the logic behind the gesture. Thanks again to Furkan Üzümcü for the C# Swipe code that my swipe is largely based on.
All my talk of OpenNI, C++, NodeJS, etc in recent months was pretty much all boring until you put it into practice and make something cool.
I present to you….the “Upright Spass”:
I’ve played around several months ago with the Kinect SDK playing a keyboard in thin air. What I was playing with then was Windows only, Kinect only, and need Adobe AIR to route things to websockets for the browser.
So using my new found powers over the past few months with:
My Asus Xtion Pro Live depth camera
C++ Addons in NodeJS
….I now have a nice little handtracking utility that runs in Node.js using OpenNI and NiTE to power my skeleton tracking.
I didn’t care for the horizontal layout of my old virtual piano – so I inverted the axis, and made the instrument control upright. Hence – “Upright Spass”….the anti-bass, the bass that is not a bass, just empty space.
So to solve this? MIDI. Hell yes, MIDI! I found a nice robust Node.js MIDI addon. So instead of making my own sounds banks, I send it out over my E-MU MIDI USB controller to my Korg X3 keyboard
And wow….the site I grabbed this image from is calling this keyboard (made in 1993) “vintage”. I feel old, damn.
Anyway – I’m running Ubuntu for this whole operation, so to route the MIDI from Node.js to my keyboard, I used Jack. Jack offers you a nice little audio server. You can patch in your MIDI through out to the E-MU MIDI USB device in. Voila, start make the link and start the Jack server.
So, I got this motion controlled midi thing all rigged up, and it’s REALLY hard to play. There were a few problems:
Playing straight notes with 2 hands in an unfamiliar environment can lead to disharmony. Seriously, on top of being hard to play, it’s way too easy to play the wrong notes. So, I restricted the instrument space to only be able to play notes in a certain key signature. I randomly chose A# Minor.
The coordinates of your 3D world will vary based on where you stand and where the camera is positioned. So, on top of sending the hand coordinates from my Node.js AddOn, I also sent the torso position. That way, all the hand positions can be calculated outward from the center of your body – and your vertical instrument is always in your center. Muscle memory is a major factor in learning to play an instrument, and you can’t learn to play if your instrument keeps shifting around on you. Ideally, I should get the user’s height and make calculations on where the instrument notes are from there as well, but I haven’t done so yet.
Even after solving a few of these problems, the Upright Spass is really hard to play. My performance was pretty much a disaster – but maybe I can tweak and practice and get passable at it.
My code for this is up on github. I mentioned the link for my Node.js AddOn previously – that’s here:
I hate to do things twice, but sometimes it just needs to be done twice, three times, or more. Luckily after the first time of faking things through, you become a bit of an expert at the many things that can go wrong!
And so it goes with building out our Node.js OpenNI plugin.
There are a couple experiments I’d love to do, but I can’t QUITE do them on Ubuntu yet. The first is to check out Viim, a more robust middleware than NiTE. The middleware, in OpenNI land, bridges the gap between the depth and RGB data and the actual gestures and skeletal data.
Like I said, Viim seems MUCH more robust than NiTE offering a full suite of gestures and other goodies compared to NiTE’s lowly three gestures – though NiTE DOES offer skeletal data. It seems that Viim is on the cusp of being released for Ubuntu, but for now, we must make do with Windows and OSX.
Another little thing I wanted available in my experiments is speech interaction. The OpenNI project doesn’t seem to offer this like Microsoft’s Kinect SDK. Nevertheless speech interaction is important to anyone studying in the Natural User Interface dojo. Luckily, new to Chrome 25 is the Speech API! v25 isn’t quite out yet, but we can grab the Chrome Canary build – which DAMMIT, isn’t available on Ubuntu side by side with the production version of Chrome.
Oh well – it’s probably time to try things out on Windows. Even if these things are released tomorrow, I’m not wasting my time. It’s good to make sure all my experiments work cross-platform. I want YOU to try this stuff out, whether it be on Windows, Linux, or whatever!
Being the noob that I am – C++ compilation on Linux was brand new to me. So GCC and G++ were new and scary. But using Make wrapped things up into a nice little command line package. It was easy to just type “make” on the command line and have everything just……go. Likewise, with Node.js’ build tool: node-gyp. Once I had my build.gyp file setup correctly, it was easy to just run “node-gyp configure build”.
Gyp would create the appropriate Make files with the configure command – and then use G++ to build the stuff that the “configure” command spits out.
Turns out that Windows was surprisingly similar, with one curveball! Node-gyp on Windows spits out “vcxproj” files (and friends). These files are actually Microsoft Visual Studio project files. So, you COULD open these right up in Visual Studio if you wanted to. I wanted to see if we could still run these on the command prompt – the same “node-gyp configure build” routine, you know?
Well, aside from making sure we have Node-gyp installed from the Node Package Manager (npm) and Python installed to complement Node-gyp, we’ll need some Windows tools:
Please note that I’m using Windows 8, so 2012 works for me! Your mileage may vary. And because I’m using Windows 8, I had trouble with my next dependency: OpenNI and NiTE!
With my old copy of my OpenNI 2.0, I actually couldn’t get things compiling on Windows 8. One of the header files complained that my C++ compiler was too new. Luckily I didn’t have to put too much brainpower in here, because OpenNI 2.1 was just released, and that solves the problem. Visual Studio 2012 happily updated the project files provided by the samples, and I could create executables, so all was quite well there. Downloading and installing NiTE appeared not to have similar problems.
After some trial and error, I was able to figure out the secret sauce. I’ve included full instructions in my Readme file on Github. But, what I ended up doing, was taking the files from “C:/Program Files/OpenNI2/Redist and dropping them at the root of my module. This included some DLL’s, lib files, and more. Basically you just need the libs and DLLs, though. I also copied NiTE2.dll from my “C:/Program Files/Primesense/NiTE2/Redist/” folder to get the NiTE middleware working.
I also ended up changing the link paths and ditching one of the compile options on Windows. While the “-Wl,-rpath ./” was the secret code to add to our Gyp file to make it build on Linux, this flag doesn’t work at all on Windows – we’ll just leave it in and Windows warns us and moves on. It seems that all Windows needs is the correct path in the “-l” flag. Linux needed a little love with those other extra options, but Windows performs like a champ with just -l./pathto/OpenNI. So, in my Gyp file, I just created some variables that are set depending on which OS you have, to point to the correct path.
Last step was compiling! Don’t use the normal DOS command prompt though – load up the Visual Studio SDK command prompt. Navigate to the source of the project and do “node-gyp configure build”.
Voila! You’ve built an OpenNI/Node.js plugin – on Windows this time!
My source for this module is here, and a simple usage example (which I describe in depth on my first post) is here.