Mozilla DeepSpeech vs Batman

No, I’m not a “Machine Learning” developer, but I am having fun feeling out what it can do. All that’s to say, this isn’t an article about the gory technical details of Mozilla’s DeepSpeech. Rather I’m writing my experience after spending a couple hours with it and being decently impressed and finding myself eagerly anticipating this project improving over time.

I’ve been peripherally aware of voice input tech for a while. Calling Google’s Web Speech API “first” does a disservice to many others before it, but it was the first one I played with, and it’s likely the first one many web developers like myself have used. It’s also insanely good right now.

Part of why it’s insanely good, is the fact that it can translate speech to text in essentially real time. Not only that, but to ensure results that make sense, it listens…but before it gives final results, it uses some natural language processing to figure out if what it heard actually makes sense…and then improves it. Using the Web Speech API, you as a developer can even see the results rated and scored.

Google is constantly improving. And it should. Their speech recognition is used on the web, on Android phones, and even on Amazon Echo competitor Google Home. They absolutely need it to be as perfect as possible so you, the user, can interact with experiences using only your voice.

Of course, Google isn’t the only one in the game. IBM’s Watson also does an exceptional job at this. Even better, their demo recognizes different speakers on the fly and labels them as such in the text back.


Multiple speakers? An option to get word timings? Fantastic! Watson is positioned as a really good service for voice recognition for a variety of applications. Watson, of course, does tons of other things. It’s actually used in “Star Trek Bridge Crew” to fill in some AI when you just want to play a mission and don’t have a real life crew waiting in their VR headsets to play with you.

I’m also fairly confident that if I looked at Microsoft’s Azure services I’d see the same, and in recent days you can see a similar cloud offering from Google

As far as I’m concerned, these companies are doing good. Cloud services are obviously popular, and speech recognition that works is a great service. There’s a problem, though.

Early on, before Google had their paid cloud service in place, when their browser Chrome first started offering the Web Speech API, you could watch network traffic in your browser and see the endpoints they were using. For any application you wanted voice in that wasn’t browser based – you could kinda sorta mock a service to their endpoint and shoot over chunks of audio data. It would do the same thing. I remember playing around with transcription of audio files via Node.js.

Honestly, this wasn’t kosher. It was Google’s service, and this is not what they intended it for. They even put a flag in their browser traffic to ensure it was coming from Chrome. Yes (sheepishly), I faked that too in my Node.js requests so I could continue playing.

Also, check out this Watson pricing page. It’s 2 cents per minute of audio uploaded. Yes, that seems super cheap. But it’s 2017 and we’re talking to our devices more than ever. Also, I have an idea for a project where I want to grab transcriptions for the entire Batman ’66 run.



Yeah, the show only ran for 3 seasons, but it was on basically every single night of the week. It clocks in at 120 episodes of around 25 minutes a pop. That’s 6000 minutes, or $60 for my stupid project idea assuming I don’t make make mistakes. My stupid project idea might not even be all that stupid – I want to catalog and time speech. Video editors can spend a long time cataloging footage, or just searching for the right thing for the right cut. What if we could throw those 50 hours of footage at a speech and face recognizer overnight and have it ready for search in the morning?

Price aside, there are data costs. Yes, I have unlimited internet at home, but what if I wanted to make a mobile application? Or a non or barely connected Raspberry Pi project? Voice is just one of those things that’s becoming super necessary especially as we enter the new age of VR/AR. As inexpensive as Watson is with 2 cents per minute, its also potentially a bit cost prohibitive in large scale use cases.

That’s why I’m excited about Mozilla’s DeepSpeech project. DeepSpeech is speech transcription service that runs locally using machine learning. The model they released is trained by way of Mozilla’s Common Voice Project, essentially crowd sourcing the training for their model.

Mozilla states that a Raspberry Pi and/or mobile isn’t in the cards yet (unless you’d like to fork the open source project and figure it out yourself), but it is on their roadmap. I’m guessing that to make it more mobile ready, the model and associated data files will need to be cut down from the 2GB that it is.

I did have some trouble getting started, but I’ll walk you through and show some results. Coming off of trying to get other ML libraries installed, this was a walk in the park and extremely straightforward. But, like i said, it’s new and I had a bit of trouble.

First of all – I had Python 3 installed. Nope. Get yourself Python 2. It’ll probably work someday on 3, but not today. Next their instructions to get started are super easy – run the Python package manager: PIP and do “pip install deepspeech”.

Unfortunately, PIP couldn’t find the package! Turns out Mozilla doesn’t offer the package for Windows yet, and in fact looking over the docs Windows might not really be tested or supported at all. With my Mac at work, I figured I was out of luck – but then remembered that Windows 10 comes with Ubuntu now! Even though I was giving it a shot, I thought it’d be futile.

Nope, worked like a charm! DeepSpeech installed quickly and easily. Next, I wanted to jump right in and give it a go. On their README, they list the command:

deepspeech output_model.pb my_audio_file.wav alphabet.txt lm.binary trie

This begs the question of….Where are those files? That model file, the binary, the txt? Not at all obvious from the README, but you can find the on the Releases part of their repo.

Once I had these in place, my first attempt threw an error. It was vague….something about having 2 dimensions.

TypeError: Array must have 1 dimensions.  Given array has 2 dimensions

All it meant was that it doesn’t support stereo WAV files, just mono ones. Somehow dimensions == tracks.

I used a YouTube downloader site to grab a few samples, then had them converted with FFMPEG. On a couple occasions, I used Adobe Audition to chop things shorter so things would only be a few seconds. You’ve got to be very careful here because your result can range from audio processing errors in your console or garbled nonsensical output!

Some tips:

  • Use 16 bit, 16khz, Mono Audio
  • Make sure to not include metadata in the file (Adobe Audition defaults on, but in the export settings you can uncheck the box for “markers and metadata”)
  • Expect a bit over double the processing time for the duration of the clip

My very first try was a random example WAV file I found online:

It was pretty good! The result was “A tree has given us a you net opportunity to defend freedom and were going to seize the moment and do it”. The mistakes were “a tree” instead of “history” and “you net” instead of “unique”. Honestly, I wonder if these methods would exist if we applied some Natural Language Processing as a filter like the cloud services do…and since we run it local, we can easily insert this step and many others. It took 10 seconds to process this 4 second audio file.

Now the real test, a two minute clip from Batman. Again, I ran this video through a downloader service. It saved to WAV, but I had to run it through Audition to make sure the bit rate and sample rate were correct.

The output was impressive, but there were long garbled stretches:

“o dascmissiur mister freeze wants what hello ill see if i can get a chief o here a moment commissoerdutchepfoherothisisbrusewiyistfreezeontswhatcommissionergardenisonlaealo with bat man mister wan and perhaps if we put the two force together and you could talk to him yourself all right chief i dont have much time oh that man yes mister wine i you heard mister freesesscirless demands just briefly if raidand i have is gobetweensare you prepared to make the telocacacminnightandpaytherensomisterwayei have no choice bad men that may i suggest you take the broadcaster the commissioners office in an hour earlier and we will have a dome package money a to me ackageyoumoney this sons risky risk is our business mister wine of course but an i have the same faginkyou that all of gottemcityaskihoperobinandi are deserving of that faith ill make the necessary arrangements a meetyouwithtaeconmster’s office at eleven in it you can altakrmisuerindeed i did that man will usunup take telltecastanlavethedammypackageemoneywaitingsewateleventoliktohsofindmensodissimilaringbatyrisbenganyooensosimilaranoherh”

What’s weird is that these garbled stretches look almost correct if they were spaced out.

So yes, it has a little way to go, but it’s impressive for launch. It’ll will also only get better as Mozilla and the community improve the models, maybe create some NLP wrappers (or otherwise), and shrink it down for mobile. Congrats Mozilla, I’m impressed – this project is needed!

The Slow March of Web Component Progress

Almost two years ago, I made a hefty series of posts on the promise of Web Components. Things have changed and promises were broken, but on the whole, I don’t think MUCH has changed from an implementation perspective. These days, I’ve been sucked into the awesome world of the 3D web and WebVR and soon WebAR, but often I need some 2D UI around my 3D scene/canvas. When I do, it’s STILL all Web Component based HTML, CSS, and pure vanilla Javascript.

API Changes

You’d think the biggest change might be version 1 of the Web Components API, but actually not much has changed from an implementation perspective. Really, some method names have changed, but the API is pretty much accomplishing the same thing.

Here’s version 0:

class MyCustomClass extends HTMLElement {
    // Fires when an instance was removed from the document.
    detachedCallback() {};

    // Fires when an attribute was added, removed, or updated.
    attributeChangedCallback(attr, oldVal, newVal) {};
    // Fires when an instance was inserted into the document.
    attachedCallback() {};

    // Fires when an instance of the element is created.
    createdCallback() {};

Now, compare that to version 1:

class MyCustomClass extends HTMLElement {
    static get observedAttributes() { return [] }
    constructor() {
    // Fires when an instance of the element is created.
    connectedCallback() {}

    // Fires when an instance was removed from the document.
    disconnectedCallback() {}

    // Fires when an attribute was added, removed, or updated.
    attributeChangedCallback(attributeName, oldValue, newValue, namespace) {}
    // Fires when an instance was inserted into the document.
    adoptedCallback(oldDocument, newDocument) {}

So pay attention, here…what actually changed? The method names, for sure, but once you change the method names, the use is exactly the same. Bonus, we have a constructor! We didn’t before, and its just plain nice to have something here to use as a callback when this component is first instantiated. Prior to this everything needed to be done when the element component is created or attached to the document. To be fair, component creation vs class instantiation seems essentially the same from a usage standpoint, but it WAS weird not being able to have a constructor on a class in version zero.

Another small change is the observedAttributes getter. Previously in version zero, the attributeChangedCallback handler worked on any attribute of your component. Changing <my-component someattribute=”hi”></my-component> to <my-component someattribute=”bye”></my-component> at runtime would trigger this handler and allow you to take action. Now, though, a developer needs to be more deliberate. If your code needs to watch for these changes from “someattribute”, this value needs to be added to the observedAttributes:

static get observedAttributes() { return ['someattribute'] }

Sure, it’s something extra to do, and yes, before I knew what this did, I spent several minutes trying to figure out why my attribute change method wasn’t being called, but it’s pretty minor and requires more deliberate intention. I can’t really complain, the change seems good overall.

From a class implementation perspective, this is all that changed! There is one other change outside the class, though. It used to be that the class would be attached to the HTML tag like this:

document.registerElement('my-component', MyCustomClass)

Now, in v1, it’s done like this:

customElements.define('my-component', MyCustomClass);

Unfortunately, while Chrome, Safari, and Opera support “customElements”, Firefox and Edge do not yet. Given that Firefox is listed as “under development”, and in Edge it’s “under consideration”, I’m OK with this. We’ll get there, but in the meantime, a polyfill works.

Undelivered promises

One of the biggest points of excitement for Web Components for me was the elegance of working with three separate things in combination to create a component: Javascript, CSS, and HTML. If you asked me 2 years ago what the biggest risk to this vision was, it was getting browsers to implement the Shadow DOM. To remind you, the Shadow DOM was a protective wall around your component. Components could have their own CSS associated with them, and this Shadow DOM protected CSS rules from the outside seeping in and wrecking your rules. Likewise, your components internal DOM couldn’t be manipulated from the outside.

Unfortunately, browsers were slow to adopt this, and even worse, it was harder to polyfill. The Polymer project even invented this notion of a “Shady DOM”. Given this confusion, churn, and uncertainty, I never really adopted using the Shadow DOM. In all honestly, I personally don’t really need it. I can see bigger applications and teams using it as a layer of protection against themselves like how other languages might use private/protected/public variables in their classes as a way of allowing team members to use and call on only what’s been exposed.

But this is the web! When this layer of protection isn’t offered to us, we just use conventions instead. Biggest and easiest convention is to just never tweak component DOM from the outside. If you need to do something like this, you’re doing it wrong…just make a method as part of your component’s API to do what you need.

CSS is a bit trickier, but we’ve had the tools we’ve needed since the earliest days of CSS. Instead of relying on the Shadow DOM to stem off outsiders from mucking with your component’s style, simply namespace every single CSS rule relating to your component with the component’s name like so:

my-component .an-Inner-Class {
  background-color: black;

All that said, it appears there is a new version of the Shadow DOM shaping up. I haven’t followed the latest here at all, but I think I might wait until there’s a strong indication things will settle down before I bother with it.

Given than the Shadow DOM, for me, is so easy to ignore until I have more confidence, I’m not really bothered. What I AM bothered by is how “HTML Imports” have dropped from favor. To be fair, we’ve always been able to polyfill HTML Imports fairly easily. At the same time, though, when Webkit/Safari has no interest and Firefox has no plans to implement, the whole notion seems dead in the water. I’ve seen some conversation that the web community doesn’t want to adopt HTML Imports in favor of the Javascript “import” mechanism, but I’m not aware that this works in a meaningful way yet for HTML, nor is “import” supported in any browser except the most recent version of Chrome and Safari.

This leaves us with a bit of a challenge. I really don’t want to create my component’s DOM entirely with code – every single tag created with “document.createElement(‘div’)” and then assigning classes, innerText, and then appending the child to a parent.

Fortunately, I’ve found that for me at least, inlining HTML into my Javascript is not as bad as I thought it might be. Components themselves should be fairly small – if you want major complexity, you may want to architect your big component into smaller ones that work together. Therefore, the HTML that you inline shouldn’t be that complicated either. By convention, I can also use the constructor for my component as a decent place to put my HTML, because there isn’t much else I need to add here.

    constructor() {
        this.template = '
            <h4>Objects\<select class="fileselector">\
                <option value="default">box primitive</option>\

    connectedCallback() { this.innerHTML = this.template; }

The above component represents a simple list (ul tag) which has a header above containing some text and a file selection menu. Honestly, the example I pulled isn’t the prettiest thing in the world right now, and once I flesh out this simple component, I expect to have double or triple the lines of HTML potentially. But, all the same, it’s pretty manageable to inline this. It also introduces a couple simple things in the way I format my HTML. I properly indent and new-line everything here just like you would see it in an HTML document. The mechanism to accomplish this readability is simply with a backslash after every continuing line.

I’ve also been exposed to the concept of backticks: `. Backticks are another way to wrap your strings in Javascript that allow you to inject variables. This is more commonly known as “template literals”. It’s not a new concept by far. Though I haven’t really done anything with string templating in the past, I believe the practice is extremely common in React, Backbone, and Underscore. I haven’t favored the usage of this for HTML because I like to keep my markup and JS separate, but I think I’m caving now to get a decent flow for components.

One problem with templated HTML in this case, though. It’s easy enough to inject a var like so:

   var inject = 'hi';
   var template = `<div>${inject}</div>`;

The problem is that in the simple example above, the “inject” variable is in the same scope as the template! Typically when I want to use this type of pattern, I prefer to store the template as a sort of variable I can access from elsewhere rather than having it inside my code logic when I’m constructing these elements.

Here’s a fake example to explain:

for (let c = 0; c < data.length; c++) {
   let myitem = document.createElement('li');
   myitem.innerHTML = `<div>${data[c]}</div>`;

In this example, I’m appending new list items (li elements) to a unordered list (ul element). Right inside my loop here, I’m declaring what my template looks like. Personally, I think this is some bad code smell! Ideally, I want to break out any HTML I have into a separate variable so that if I AM going to inline my HTML (which I still think is kinda smelly on its own), I should at least have it separated out so I can easily track it down and change it. Putting it inside my application logic especially inside a loop like this just feels terrible.

Unfortunately, it’s not possible to save a template with literal like this as a variable. Instead, we can create a method that accommodates both this and the creation of the element:

    itemTemplate(data) {
        var template = document.createElement('template');
        template.innerHTML = `<li class="mesh">${data}</li>`;
        return template.content.firstChild;

I use the “template” tag here so I don’t have to decide upfront which type of tag to create, and my tag (including the outer tag) can live entirely in this template string. Otherwise, for my outer tag I’d also have to have additional JS calls to set any attributes, classes, or IDs on it.

Custom Events

Custom events haven’t changed, but there’s a little trick I like to use that’s worth mentioning. Here’s the creation and triggering of a custom event:

        let ce = new CustomEvent('onCustomThing', { detail: { data: data }});

The above code is pretty simple, but there is one thing I don’t like about it, and that is the string ‘onCustomThing’. If you think about it, whoever consumes this event outside this class needs to spell ‘onCustomThing’ correctly AND use the correct capitalization. If we change this over the course of our project, we could break things and not know it.

That’s why I like to assign a sort of a static constant to the web component class. In practice I haven’t been using any JS language features that dictate it is a static constant (though I probably could copying how observedAttributes is declared). Here’s how I do it:

MyComponent extends HTMLElement {
    disconnectedCallback() {}
    attributeChangedCallback(attributeName, oldValue, newValue, namespace) {}
    adoptedCallback(oldDocument, newDocument) {}
MyComponent.CUSTOM_THING = 'onCustomThing';
customElements.define('my-component', MyComponent);

So now, elsewhere, I can listen for the event like so:
mycomponent.addEventListener(MyComponent.CUSTOM_THING, e => this.onCustomThing(e));
Yesssssss, you could bungle the syntax here as well making it as bad as a string, but it’s easier for an IDE to refactor and predictively type as you code.

What’s missing

This last bullet point of what’s missing is a minor one, and I think it’s slowly being corrected. Web Components aside, I’ve been developing most of my projects using Javascript modules by way of the “import” command. Chrome’s latest version supports it, though I haven’t properly tried it out yet. Instead, I’ve been relying on the “browser-es-module-loader” polyfill. It works amazingly well and I use it as a way to give my application a class based “controller” that can import modules as it needs to.

So you can import a “main entry point” Javascript file as a controller, and anything downstream can also import modules. It’s pretty awesome, but any Web Components you use in your application are NOT downstream of this controller and as a result cannot use imports. I haven’t put in any serious brainpower to overcome this, but instead when I run into this issue, I take it as a hint that my component could be getting a bit too complex, and I work around it. Honestly, though, once this polyfill is not needed anymore, I’ll be happy!

Final Thoughts

As a whole, I’m still happy with writing web components after 2 years. I still have no desire to change. I think things are getting better and better, just a bit more slowly than I originally anticipated. I’m also a bit surprised at HTML imports being on its last legs. As a workflow and architecture, I still think it holds up really well, even if we have to shuffle around some of the pieces that make it up.

Everybody is different, though, and there are many different tools for many different jobs. I still haven’t touched React or Angular 2-4 yet. I’m happy, but if you use those frameworks, you might be perfectly happy too! Consider this another tool to add to your belt (without all the bells and whistles of course).

Back From VRLA

I believe it was during a session called “Shooting VR for Post” that I found myself identifying heavily with one of the panelists who said something to the effect of “Before VR, my work was a bit mundane. We’d take a look at a shot we needed to do in a meeting, and we wouldn’t even have to talk, we’d instantly know what our roles were and break to get down to work. With VR now, it’s not that easy, we need to knock our heads against the wall and really come up with ways to get the job done.”

As a web developer, I share this sentiment completely. The speaker expounded, giving an example like when Houdini comes out with a new node (I can only vaguely guess what this means), there’s a level of excitement, but it’s short lived. I feel similarly when a new Web API or Node.js based front-end workflow enhancement comes out, or a new framework is released. It changes our workflow in a nifty way, but it doesn’t necessarily change the work we create in a meaningful way.

It’s a big sentiment, and I feel it’s absolutely monumental that I happen to share this sentiment about the same new technology with a cinematographer…someone whom I might never even speak to in a professional capacity. I also seem to share this sentiment with sound engineers, game developers, VFX artists, hardware manufacturers, and more. I even had a fascinating conversation about depth being registered in your hypothalamus vs your visual cortex with a Game Developer/Designer/Cognitive Psychologist.

I’m silo-ing people a bit here because the more curious amongst us (including myself) have always enjoyed exploring the fringes of our craft. It’s not necessarily true that I wouldn’t talk to a cinematographer as a web developer, but it’s also not necessarily normal.

The point is that VR is bringing the best minds from all disciplines together and dissolving the fringes between these disciplines. Conferences like VRLA allow the stories of these boundaries breaking down to be told.


This is incredibly important, not only for getting acquainted with what skills are being injected into this new medium and why, but also because nobody knows the right way to do things. When there’s no right way to do things, there’s no book you can buy, nothing to Google, nothing we can do except hear about fleeting experiences from people that got their hands dirty. We need to hear about their pain and about their opinions formed from creating something new and unique. When we hear lots of such perspectives, we can assemble a big picture, which I’m sure will be shattered by the next VRLA. I’ll be waiting to learn about the hypothetical magician a panelist cited as a great collaborator for focusing attention in a 360-degree world.

Also interesting is the regionality of VR creators. I feel like I hear an entirely different story in San Francisco versus what I heard at VRLA. When I attend the (admittedly low number of, so far) meetups around the Bay Area, it’s mostly about hardware, platforms, new app ideas, prototypes, social experiences. In LA, I feel that it was overwhelmingly VFX, cinematography, sound design…a very heavy focus on well-produced content. I’m still uncertain about the regionality around game development, perhaps because it’s relatively regionless. Though, one memorable paraphrased line on that subject was “Game devs are now sitting in the same room as VFX artists and directors.”

Perhaps one of the more interesting things I picked up was the different stories from different creators on immersive video. Immersive or 360 video seems like a mainstay in VR. The cries of it not really being VR have been sufficiently drowned out with most, if not all, presenters acknowledging the sentiment but disagreeing with it. Andrew Schwarz of Radiant Images, for example, called immersive video the “killer app” of VR. I expected this sentiment, especially in a city with so much film talent.

Andrew Schwarz of Radiant Images showing the new completely modular camera mount (AXA 360 Camera System) for immersive media
Andrew Schwarz of Radiant Images showing the new completely modular camera mount (AXA 360 Camera System) for immersive media

What I did not expect was the nuance verging on disagreement from Dario Raciti of OMD Zero Code. His point of view seemed to be that the novelty of immersive video has waned. His interest lies in creating marketing campaigns that make brands like Nissan and Gatorade stand out from the rest. Answering my question of what kinds of projects he tries to sell to clients, he flat out says he tries to discourage pure 360 video. Instead, he prefers a more immersive experience mixed with 360 video.

An excellent example of this was his “Let Hawaii Happen” piece. The user begins on a parachute they can steer and navigate to various islands in Hawaii. Once they’ve landed, it switches to a non-interactive 360 video tour.

I think Dario’s take on advertising with VR is very much worth listening to. His team also created a car-shopping VR experience for Nissan in which the user is seated to get a feel for the interior of the car, much like what you would do car shopping in reality. Outside the windows, however, a much different scene plays out: the viewer is also part of a battle in the Star Wars universe.

That exemplifies Dario’s notion of mixing real-time 3D content with immersive video, but it also touches on his point about advertising in general. To liberally paraphrase, Dario feels you should never beat the user over the head with branding. No logos, no mentioning of the brand unless its subtle and integrated into the experience. The experience always comes first, and if it’s memorable, it will sell the brand.

To me, this speaks to the larger issue of taking concepts we already employ en masse in traditional media and shoe-horning them into VR. Advertisers, I know you’re already thinking of this. You want to cut to commercial, put your logo on the bottom third of the screen, and include voice overs about how your brand is the best. Dario is saying to create good marketing experiences, let the content flow freely and be subtle about your brand. Consumers will respond better. He even cited “Pearl,” an Oscar-nominated VR short, as an example of something that could be a commercial with extremely limited changes.

The notion of shoe-horning brings another memorable notion to mind. To date, I’ve been thinking about VR like the jump from desktop to mobile. But the better analogy from one panelist was that “VR is like the jump from print to digital.” While stubbornness to hold on to the old ways can be detrimental, years of experience coupled with open-mindedness can be a huge asset.

In the Cinematographers’ panel, it was mentioned that old 3D tricks, because of limited processing power, are now coming back into fashion. The reason being that game engines like Unreal are coming into favor for doing real-time previews of scenes. Even traditional film equipment is being recreated in VR to help production. To hear a cinematographer talk about replicating a camera crane in VR and then shrinking it down, scaling it up, putting it on a mountain-top…. all within a day’s shoot was incredibly interesting.

Shooting VR for Post Panel

The panelists and presenters at VRLA shared so much of their recent, and super fascinating, experiences based on their experimentation. This was a bit unfortunate, because I found myself glued to the presentation rooms and out of the expo floor. I saved my 2-hour lap through the expo hall until the very end. As expected, the lines for the more interesting experiences were either too long or closed. I can’t fault VRLA or their exhibitors for this; it seems a standard downside of VR conferences. I would wager that the most popular experience was the Augmented Reality (Hololens) Easter Egg hunt. As I didn’t experience it, I’ll just leave you with a photo because it looks awesome.

Microsoft Hololens Augmented Reality Easter Egg Hunt
Microsoft Hololens Augmented Reality Easter Egg Hunt

Of course, like Microsoft, a bunch of big vendors were there: Facebook, HTC, Intel. Although I don’t own a Vive, their talk of the multi-platform subscription service and their wireless headset was exciting. So was hearing how dedicated Intel, HTC, and HP are to VR developers. Yes, Facebook and MS are dedicated to Mixed Reality as well, but for me, that message was well received a while ago, so it’s awesome to see the pile on.

Being that there were around 170 exhibitors at VRLA, there were tons of smaller ones showing games, hardware, new experiences, and new creative tools. One notable company, Mindshow (, offers creative tools for recording animated characters with your body and voice in real-time. Watching from the expo floor, I was a bit disappointed as it felt too scripted. However, a booth attendant assured me it was that way for the 10-minute, quick demo for conference go-ers. It makes sense that you’d probably not want to start users with a blank slate if you only have a short window to impress them. So, if Mindshow is what I think it is, I can imagine having so much fun myself, and I can see many people creating awesome animated content extremely easily….but I’ve been known to overhype things in my own head.

Though it was my first time, VRLA has been going on for 3 years now and they’ve grown exponentially. The conference-going experience was not as seamless as others I’ve been to. The Friday keynote was delayed by at least 30 minutes because the speaker had no slide notes, which set off a cascade of presentation time pushbacks. There were constant audio issues, and the light field talk I was really looking forward to was cancelled with no explanation. This is all forgivable and probably par for the course given how many people from different disciplines are coming in and bringing their passions and experiences. There’s an amazing energy in VR. Organizations and conferences like VRLA focus it. It might not be laserlike as VR grows exponentially, but with a medium so young and with so many stories still to be told from creators about their experimentation, everything is appreciated.

A Week at the Hololens Academy

Ahhhhh the Hololens. I finally get to check it off my list. When I express my disappointment with not being able to try it out to my friends and co-workers that are interested in VR, it’s kinda like talking about going to Hawaii. “Ohhhh, you haven’t been? You really should, it’s an enjoyable experience.” (said, of course, with a knowing smirk and possibly a wink).

There’s a good reason for that knowing wink. Its a massively cool device, and despite being publicly available now to early adopters, there’s a waiting list and it’s $3k. Someone mentioned to me that they are in the “5th Wave” of wait list. So, right now, it’s hard to get your hands on it. And that’s IF you’re willing to shell out the money.

Should you buy it if you get the chance? Maybe. For me, there’s lots of parallels to Google Glass from a few years ago, but also lots of reasons it might break free from technological oddity into the mainstream.

In terms of sheer impressiveness in hardware, hell yes it’s worth $3k. Though it can be tethered via USB for the purposes of big deployments of your project, it’s completely wireless and independent. The computer to run it is built right into the device. It packs Wifi, 64GB of memory, a camera (both RGB and depth), and other sensors for headtracking (probably an accelerometer and gyroscope). Even the casing of the device is impressive. It looks slick, true, but the rotatable expandable band that makes every effort to custom fit your head is practically perfect. I originally didn’t put it on my head completely correct at first, and the display was resting on my nose a bit which would have been uncomfortable after awhile. Turns out, if you balance it on your head correctly, it barely touches your nose and almost floats on your face.

Compare the hardware to something like the Oculus Rift or the HTC Vive which are just display and you supply your own computer to tether to (and aren’t augmented reality). They run $600-800 plus at least a $1k desktop computer. I can’t recall who, but someone with me made the almost cruel observation of the size of an NVIDIA GTX970 graphics card compared to the size of the entire Hololens headset.

nvidiavshololensThe display is another massively cool hardware piece and makes the entire system come together as one. It has it’s problems which I’ll get into (cough cough field of view), but I’ll talk about that in a second when I get to usability. And make no mistake….usability is why or why you should not run right out and purchase one of these devices. The Hololens isn’t so much a tool as it is an experience. It’s not a hammer and nail. It’s more of a workbench. A beautiful workbench can be amazing, but if you can’t open the drawer to get to your hammer and nails and you want to create something, it’s worthless.


Training at Microsoft HQ

Awful analogies aside, and usablility aside, let me say a quick word about the training. Microsoft calls it “The Hololens Academy”. It occurs to me just now, that this might be a thinly veiled StarTrek reference. In fact, ALL of the training assets were space themed. From a floating astronaut, to a virtual futuristic tabletop projector, to a mid-air representation of our Solar System.

My company, Adobe, was kind enough to send me last minute to Redmond and do some learning. I honestly didn’t know what to expect because it was so last minute. Was it super secret stuff? No…but considering I hadn’t seen the not secret stuff yet, it really didn’t make too much difference. In fact it was SO not secret that our class followed along with well developed training material that MS has published online.

In fact, in a testament to how well developed it is…I was weirded out a bit on the first day to be honest. It had that theme park feel. Or that historical city tour feel. You know, where every word and joke your guide says is rehearsed and feels forced? But I got over that real fast, you know why? Because the sessions went like clockwork. The instructors kept exact time to an eerie degree, and the assistants WERE psychic. Virtually every time I had trouble, an instructor was behind me within a few seconds helping me out. I didn’t raise my hand, look confused, nothing. And there wasn’t a single time where I felt like they were annoyingly hovering. They just showed up out of the blue being insanely helpful.

The room itself was laid out extremely well for training. An open workspace with large screen TV’s on the wall facing every which way with the instructor in the center on a headset made a very great training space. The instructor didn’t even drive the software. He or she (they changed out every 3 hours), would have someone else driving the presentation machine while they spoke. This kind of coordination takes practice, no doubt.

The walls and tables were decorated for the event too, along with coffee tables specifically for placing your virtual assets on (holograms). The room is probably a permanent fixture specifically for this.

This all means one thing to me. We’ve got publicly available training materials, with tons of care put into creating them, extremely well staffed and smart trainers, and a training room just for the Hololens. Add to this the hundreds of engineers working on Hololens, adding the fact that MS is just now offering developer support for it… and the message is loud and clear. Microsoft is placing a HUGE bet on the Hololens. They aren’t half assing this like a lot of companies in their position might for a product that is so different and hard to predict how well it’s adopted.

Training style aside – I found another thing extremely interesting about the training. It’s all about Unity.


Authoring with Unity

Unity seems like kind of an underdog at the moment. It’s essentially a 3D authoring environment/player. It doesn’t nearly have the reach of something like Flash or Quicktime which at one point or another has been ubiquitous. Yet, its a favorite of 3D creators (designers and devs) who have the desire to easily make 3D interactive experiences. The reach of Unity alone (browser plugin, WebGL, Android, iOS, desktop application, Oculus, Vive, Gear, and now Hololens as well as others) puts it right in the middle of being THE tool for creating VR/AR/Mixed Reality content.

I was naive to not expect MS using Unity for experience creation. But, the fact is, it’s one of the ONLY tools for easy interactive 3D scene creation. I honestly expected Microsoft to push us into code only experience creation. Instead, they steered us into a combo of 3D scene building with Unity and code editing (C#) with Visual Studio. To be honest I’m a little resistant of Unity. Its not that its not an excellent tool, but I’ve gone through too many authoring tools that have fallen out of favor. This training is a wakeup call, though. If Oculus, Gear, HTC Vive weren’t enough to knock me over the head – a major company like MS (who has a great history of building dev tools) using a third party tool like this….well consider me knocked over the head and kicked in the shins.

The exercises themselves, were a mix of wiring things up in Unity and copying/pasting/pretending to code in Visual Studio. Its a hard thing to build a course around especially when offering this to everyone with no prerequisites, but MS certainly did a good job. I struggled a bit with C# syntax, not having used it in years, but easily fell back to the published online material when I couldn’t get something.


Usability and VR/AR Comparisons

OK so, the Hololens has the sweet sweet hardware. It has the training and developer support. All good right? Well no, there’s another huge consideration. The hugest consideration of all. How useable is it, and what can end users do with it?

You might guess that what end users do with it is up to you as a developer, and that’s partially right. Everything has limitations that enable or inhibit potential. Here’s the thing, though – take the iPhone or iPad for example. When it came out it WAS groundbreaking. But it wasn’t SO different that you had to experience it to imagine what it could do. Steve Jobs could simple show you a picture of it. Yep it had a screen. Jobs could show you interaction through a video: Yep you can swipe and tap and stuff. People were imaginitive enough to put 2 and 2 together and imagine the types of things you could do based on never having used the device. Sure, people are doing amazing things with touch devices that would have never been imagined without using it – but the simplest of interactions you can certainly get the gist when seeing it used without using it yourself.

VR is somewhat harder to pin down, but again, its somewhat easy to imagine. The promise is that you are thrown into another world. With VR, your imagination can certainly get ahead of itself. You might believe, without donning a headset that you can be teleported to another world and feel like you’re there.

Well, yes and no, and it’s all due to current limitations. VR can have a bit of a screen door effect meaning if you focus hard enough you feel like you’re in front of a screen. With VR, you are currently body-less. When you look down you’ll probably see no body, no hands, or even if it’s a great experience, it won’t look like YOUR body. This is a bit of a disconcerting experience. Also, you DEFINITELY feel like you’re wearing a headset. So yes…with VR, you ARE transported to a different and immersive space, however you need to suspend disbelief a bit (as amazing as it is).

AR is similar but a little worse. I can only comment on the Hololens, but its not the magical mixed reality fairly tale you might be led to believe. Even worse MS’s published videos and photos show the user being completely immersed in holograms. I can’t really fault them for this, because how do you sell and show a device like this that really must be worn to experience?


Field of View and other Visual Oddities

The biggest roadblock to achieving this vision is field of view. From what I’ve heard, its the one biggest complaint of the Hololens. I heard this going in and it was in the back of my head before I put the device on, but it took me an embarassingly long time to realize what was happening. A limited field of view means that the virtual objects or Holograms only take up a limited portion of the “screen”. Obviously. But in practice, this looks totally weird especially without some design trick to sweep it under the rug and integrate the limitation into the experience.

When you start viewing a 3D scene, if things are far away, they look fantastic! Well integrated with your environment and even interacting with it. Get closer though, and things start falling out of your field of view. Its as if you’re holding a mobile screen up fairly close to your face, but the screen has no edges and it doesn’t require your hand to hold it up. Well, what happens to things off screen? They simple disappear, or worse they are partially on screen but clipped to the window.

I took this image from a article about the field of view, here’s their take on it, but for our sake right now, here’s a great approximation of what you would see:


People also use peripheral vision to find things in a large space, but unfortunately in this scenario you have no periphery – so it can be easy to not have a good understanding of the space you’re in right away.

There are a couple other visual limitations that make your holograms a bit less believable. For one, you can certainly see your headset. The best way to describe it is that you can certainly see when you’re wearing sunglasses and a baseball cap (though the Hololens certainly doesn’t protrude as far as a cap rim). You can also see the tinted projection area and some of the contours of that area in your periphery. It easy to ignore to an extent, but definitely still there. Also, you can see through the Holograms for sure. They’re pretty darn opaque, but they come across as a layer with maybe 90% transparency.

Another point is that in all the demo materials, if you get suspiciously close, the object starts disappearing or occluding. This is directly due to a camera setting in Unity. You can certainly decrease this value, however even the lowest setting is still a bit far and does occlude, and even then, the Hololens makes you go a bit crosseyed at something so close. You might say this is unfair because its simply a casualty of 3D scenes. To that, I say to check out the Oculus Rift Dreamdeck and use the cartoony city demo. You can put your head right up next to a virtual object, EXTREMELY close, and just feel like you can touch it with your cheek.

Lastly, overhead lights can cause some light separation and occasionaly push some rainbow streaks through your view especially on bright white objects like the Unity splash screen. This point, I can directly compare this to the flare of white objects on the Oculus Rift due to longer eyelashes.

For these above reasons – I don’t think the Hololens can be considered an immersive device yet like VR is. VR is really good at transporting you to a different place. I thought the Hololens would be similar in that it would convincingly augment your real world. But it doesn’t for me. It’s not believable. And thats why for now (at least 10-15 years), I’m convinced that AR is NOT the next generation after VR. They will happily live together.

If VR is the immersion vehicle – something that transports you, what’s AR? Or more specifically, the Hololens? Well, just because something isn’t immersive, doesn’t mean it can’t be incredibly useful. And I think that’s where the Hololens lies for the near term. It’s a productivity tool. I’m not sure I think games or storytelling or anything like that will catch on with the hardware as it is now (as cool as they are demo-wise until the immersion factor improves). No – I think it can extend your physical screen and digital world to an exceptional degree. Creating art, making music, even just reviewing documents can all be augmented. Your creation or productivity process doesn’t have to be immersive, just the content you create.

I think this point is where AR really shines over VR. In VR, we’re clumsily bringing our physical world into the virtual world so we can assist in creation using things modeled after both our real tools and 2D GUI tools. And usually this doesn’t work out. We have to remove our headset constantly to properly do a job. With AR, the physical world is already there. Do you have a task that needs to be done on your computer or tablet? Don’t even worry about removing your Hololens. Interact with both simultaneously…whatever. In fact, I think one HUGE area for the Hololens to venture into is the creation of immersive VR content itself. One for the immersive, one for the productive.

That’s not to say I don’t think casual consumers or others will eventually adopt it. It certainly could be useful for training, aid in hands free industrial work, anything that augments your world but doesn’t require suspension of disbelief.


Spatial Awareness

Hololens immersion isn’t all doom and gloom though. Spatial awareness is, in fact, AMAZING. The 3D sensor is constantly scanning your environment and mapping everything as a (not fantastically accurate but damn good) mesh. Since it uses infrared light like the Kinect to sense depth, it does have its limitations. It can’t see too far away, nor super close. The sun’s infrared light can also flood the sensor leaving it blind. One fun fact that I’ve learned is that leather seems to not reflect the light too well, so leather couches are completely invisible!

We did a really simple demo of spatial mapping. It looked amazing how we lined the real walls with a custom texture with blue lines. My Adobe colleague decided to make the lines flash and animate which was super mesmerizing. Unfortunately, I didn’t find the mixed reality video capture feature until after that, so here’s a nice demo I found on YouTube of something similar (though a bit more exceptional and interactive)

As scattered IR light tends to be sort of…well…scattered, meshes certainly don’t scan perfectly. That’s fine because MS has built some pre-packaged DLLs for smoothing the meshes out to flat planes and even offers advice on wall, ceiling, floor, and table finding.

Of course, once you’ve found the floor or surfaces to ineract with, you can place objects, introduce physics to make your Hologram interact with real surfaces (thanks Unity for simply collision and rigid bodies!), and even have your Holograms hidden behind real things. The trainers seemed most eager to show us punching holes in real objects like walls and tables to show incredible and expansive virtual worlds underneath. Again…though…the incredible and expansive can’t be immersive with the field of view the way it is.

Here’s a good time to show our group lobbing pellets at each other and hitting our real world bodies. The hole at the end SHOULD have been on the table, but I somehow screwed up the transformation of the 3D object in Unity, so it didn’t appear in the right spot. It does show some great spatial mapping, avatars that followed us around, and punching a hole through reality!


Spatial Audio

Spatial audio is another thing I’m on the fence about. It’s a bit weird on the Hololens. I give Microsoft major props for making the audio hardware AUGMENTED but not immersive. In VR systems, especially the Oculus Rift, you’d likely have over the ear headphones. Simple spatial audio (and not crazy advanced rocket science spatial audio) is limited to your vertical plane. Meaning, it matches your home stereo. Maybe a few front sources (left, right, and center), and a couple back source on your left and right. With these sources, you fade the audio between the sources and get some pretty awesome positional sounds.

On the Hololens, however, the hardware speakers are positioned above your ears on the headband. They aren’t covering your ear like headphones.


0e9693c5-78a1-4c51-9331-d66542e5fee9So yes, you can hear the real world as easily as you could without the headband on, but being positioned above your ears make it sound like the audio is always coming from above. One of our exercises included a Hologram astronaut. You’d click on the astronaut, and he’d disappear, but he’d talk to you and you were supposed to find him. Myself and everyone near me kept looking up to find him, but he was never up high – and I’m sure this is a direct result of the Hololens speaker placement. I asked the instructor about positional audio that included vertical orientation as well, and he said it was hard computationally. I know there are some cool solutions for VR (very mathy), but I’m skeptical on the Hololens. The instructors did say to make sure that objects you’d expect higher up (like birds) appear higher up in your world. I personally think this was a design cop-out to overcome the hardware.



Last thing I want to cover is input. Frankly I’m disappointed with EVERYONE here (except for the HTC Vive). It seems mighty trendy for AR and VR headsets to make everyone do gaze input, but I hate it and it needs to die. The Hololens is no exception here, it’s included in all the training material and all of the OS interactions. Same goes for casual interactions on the Oculus Rift (gaming interactions use an XBOX controller, still dumb IMO) and Google Cardboard. The HTC Vive and soon the Oculus Rift will have touch controllers. Google Cardboard will soon be supplanted by Daydream which features a more expressive controller (though not positional). I’ve heard the Hololens might have some kind of pointer like Daydream, but I’ve only heard that offhand.

Gaze input is simply using the direction of your eyes to control a cursor on screen. Actually, it’s not even your eyes since your eyes can look around….Gaze input is using the center of your forehead as a cursor. The experience feels super rigid to me, I’d really prefer it be more natural and allow you to point at something you aren’t looking at. With the Oculus Rift, despite having gaze input, you also have a remote control. So to interact with something, gaze at it and click the remote.

The Hololens on the other hand, well it SEEMS cool, but it’s a bit clunky. You’re supposed to make an L with your thumb and index finger and drop the index finger in front of you (don’t bend your finger, or it may not recognize the action). You also have to do this in front of the 3D sensor, which doesn’t sound bad, but it would be way more comfortable to do it casually on your side or have your hand pointed down. And to be fair, spoken keywords like “select” can be used instead. We did also play with exercises that tracked your hands position to move and rotate a Hologram. All the same, I really think AR/VR requires something more expressive, more tactile, and less clunky for input.



All that said, the Hololens is an amazing device with enormous potential. Given that Microsoft’s CEO claims it is a “5 year journey”, what we have right now is really a developer preview of the device. For hardware, software, and support that feels so polished despite interaction roadblocks, it will be most likely be amazing what consumers get in their hands 5 years from now. So should you shove wads of cash at MS to get a device? Well, me…I’m excited about what’s to come, but I do see more potential for VR growth right now. I’m interested in not just new interaction patterns with AR/VR, but also about exploring how immersiveness makes you feel and react to your surroundings. The Hololens just doesn’t feel immersive yet. Additionally, it seems like the AR/VR community are really converging on the same tools, so lessons learned in VR can be easily translated to AR (adjusting for the real world aspect). The trainers made sure to point this out – the experiences you build with Unity should be easily built for other platforms. It will also be interesting to see in the next 5 years where Google takes Tango (AR without the head mounted display) and potentially pairs it with their Daydream project.

All that said, it’s all about use cases and ideas and making prototypes. If a killer idea comes along that makes sound business sense and specifically requires AR, the Hololens is pretty much the only game in town right now, so if that happens I’ll be sure to run out and (try to) get one. But in terms of adopting the Hololens because of perceived inevitability and coolness factor? I might wait.

But if you don’t own any AR/VR devices, cant wait to put something in the Windows store, can live with the limitations, and are already an MS junkie – maybe the Hololens is for you!

I’d like to give a big thanks to Microsoft for having us in their HQ and having such fantastic trainers and training material. I expect big things from this platform, and their level of commitment to developers for such a new paradigm is practically unheard of.

Adventure Time: Magic Man’s Head Games…

… and other platformers.

In my last post, I was really psyched over the suspension of disbelief, cartoony, fantasy world like experience. It’s fitting that my first purchased content would be this game.

To be honest, I bought this for two reasons.

  1. It’s cheap at $4.99
  2. I freaking love Adventure Time

The result was that I was blown away. And this is odd, because if it was a normal 3D game release, it would be INCREDIBLY underwhelming. Even a bit underwhelming for $4.99.

Why? Well, gameplay won’t last more than an hour or so. Maybe two. The enemies aren’t that good (you’re mostly fighting sandwiches that don’t do much). The story isn’t deep at all, and the graphics are “meh”.

You too, reading this post can be pretty ambivalent by looking at a screen capture



Like I said, the graphics are “meh”. But allow me to say the first good thing about it, and its that the graphics it does have captures the cartoony nature of the show pretty well.

In VR though? Wow.

In a year or two, I think this game will be as underwhelming in VR as it appears. But props to Turbo Button for making you feel like a part of the game. Right off the bat, it’s just crazy cute to live virtually inside this admittedly sparse world and seeing Finn the Human and Jake the Dog interacting with you.

Also right away, the story is very cleverly set up for the medium. I do think content creators should take note of the way you’re ingrained into the story and to have that become a mechanism for playing the game.

Lemme explain…

The game starts with you approaching Finn and Jake on a field as a tiny person/thing/whatever. You’re instantly accepted as buds with them. You never see who/what you are because it’s all first person view. Unfortunately, Magic Man pops in randomly from out of nowhere and starts wreaking havoc (sounds weird, but actually very in character with the show). Magic Man uses his magic and makes you, the player, incredibly huge. This story mechanism, effectively turns you into the camera.

Finn and Jake plod on as you control Finn with your XBOX controller. But you as….well, huge you/game camera both follow them around in hopes Magic Man can be found and subdued into turning you back to normal size. The perspective/size change alone, is something very interesting and ripe to explore in VR. This game only touches it briefly as it’s story intro, but all the same, I’d love to see more in other experiences.

Now that you’re the game camera, very interesting things can be found and, well NOT found.

Go back to the first set of popular 3D platformers. Say….Mario 64:


Because it was 3D, there was a camera. The camera would awkwardly follow you around, and when it was exceptionally awkward, you’d use your joystick to move it.

With Adventure Time, the camera still follows Finn around…but only loosely. Remember that you are the camera, and peeling the onion skin back, you’re wearing a headset on your face that you control as naturally as you would looking around in real life.

The game doesn’t have very intricate levels, but there are some hiddenish paths to explore. Free movement of your head as well as the ability to physically lean, duck, stand on your tiptoes in real life adds a VERY interesting element to the old 3D platformer. In some ways I can liken it to controlling a character within a dollhouse in good old fashioned meat space. Its a very unique perspective. I only wish that there were other ways to control it besides the XBOX controller, because that feel in your hands pulls you back to thinking it’s fake again.

Its so hard to get this point across without experiencing it for yourself. Just imagine being able to standup and look around this environment while your character hangs tight


In a further nod to keeping you part of the game, both Finn and Jake will interact with you and talk to you regularly. Sometimes exploiting the infamous cheesy “I’m watching a 3D movie gag” by throwing something in your face. But yah, here’s Finn chatting you up:


All in all, its so worth $4.99. Probably not worth an extra zero, but I’m really glad I purchased this one as my first VR game. The original voices and sticking to an albeit simplish Adventure Time plot with very Finn and Jake-ish dialog makes me smile.

I should also toss a nod to a game called “Lucky’s Tale”. This game comes free with the Rift, but I didn’t try it until after Adventure Time. It’s obviously more geared towards kids, as adult me didn’t care about the story. It was also a bit boring and cheap just capturing coins as I plod through the levels. Use of the camera in this 3D platformer has the same gameplay mechanic as Adventure Time did, but without getting written into the story. I think by the time I got to Lucky’s Tale, my awe and wonder for the re-invention of the 3D platformer was used up, so this fell flat for me. That said, if you’re shy about trying it and you have a Rift, it certainly won’t cost you anything! And also to be fair, I do think the art direction, style, and level design surpasses Adventure time by a fair bit.


What to cover next? I just recently bought Subnautica and the Climb. Both are pretty fascinating, and I’ll write these up later. As you can tell, I’m not so concerned with telling you about core gameplay or how fun it is. I even thought I might just analyze user interaction in VR – how it’s done in this brave new world, but it turns out that what kinds of feelings this content evokes is a major part of the user experience.