Mozilla DeepSpeech vs Batman

No, I’m not a “Machine Learning” developer, but I am having fun feeling out what it can do. All that’s to say, this isn’t an article about the gory technical details of Mozilla’s DeepSpeech. Rather I’m writing my experience after spending a couple hours with it and being decently impressed and finding myself eagerly anticipating this project improving over time.

I’ve been peripherally aware of voice input tech for a while. Calling Google’s Web Speech API “first” does a disservice to many others before it, but it was the first one I played with, and it’s likely the first one many web developers like myself have used. It’s also insanely good right now.

Part of why it’s insanely good, is the fact that it can translate speech to text in essentially real time. Not only that, but to ensure results that make sense, it listens…but before it gives final results, it uses some natural language processing to figure out if what it heard actually makes sense…and then improves it. Using the Web Speech API, you as a developer can even see the results rated and scored.

Google is constantly improving. And it should. Their speech recognition is used on the web, on Android phones, and even on Amazon Echo competitor Google Home. They absolutely need it to be as perfect as possible so you, the user, can interact with experiences using only your voice.

Of course, Google isn’t the only one in the game. IBM’s Watson also does an exceptional job at this. Even better, their demo recognizes different speakers on the fly and labels them as such in the text back.

watson

Multiple speakers? An option to get word timings? Fantastic! Watson is positioned as a really good service for voice recognition for a variety of applications. Watson, of course, does tons of other things. It’s actually used in “Star Trek Bridge Crew” to fill in some AI when you just want to play a mission and don’t have a real life crew waiting in their VR headsets to play with you.

I’m also fairly confident that if I looked at Microsoft’s Azure services I’d see the same, and in recent days you can see a similar cloud offering from Google

As far as I’m concerned, these companies are doing good. Cloud services are obviously popular, and speech recognition that works is a great service. There’s a problem, though.

Early on, before Google had their paid cloud service in place, when their browser Chrome first started offering the Web Speech API, you could watch network traffic in your browser and see the endpoints they were using. For any application you wanted voice in that wasn’t browser based – you could kinda sorta mock a service to their endpoint and shoot over chunks of audio data. It would do the same thing. I remember playing around with transcription of audio files via Node.js.

Honestly, this wasn’t kosher. It was Google’s service, and this is not what they intended it for. They even put a flag in their browser traffic to ensure it was coming from Chrome. Yes (sheepishly), I faked that too in my Node.js requests so I could continue playing.

Also, check out this Watson pricing page. It’s 2 cents per minute of audio uploaded. Yes, that seems super cheap. But it’s 2017 and we’re talking to our devices more than ever. Also, I have an idea for a project where I want to grab transcriptions for the entire Batman ’66 run.

MV5BMTkzNDY5NTg5MF5BMl5BanBnXkFtZTgwNzI4NzM1MjE@._V1_UY268_CR13,0,182,268_AL_

 

Yeah, the show only ran for 3 seasons, but it was on basically every single night of the week. It clocks in at 120 episodes of around 25 minutes a pop. That’s 6000 minutes, or $60 for my stupid project idea assuming I don’t make make mistakes. My stupid project idea might not even be all that stupid – I want to catalog and time speech. Video editors can spend a long time cataloging footage, or just searching for the right thing for the right cut. What if we could throw those 50 hours of footage at a speech and face recognizer overnight and have it ready for search in the morning?

Price aside, there are data costs. Yes, I have unlimited internet at home, but what if I wanted to make a mobile application? Or a non or barely connected Raspberry Pi project? Voice is just one of those things that’s becoming super necessary especially as we enter the new age of VR/AR. As inexpensive as Watson is with 2 cents per minute, its also potentially a bit cost prohibitive in large scale use cases.

That’s why I’m excited about Mozilla’s DeepSpeech project. DeepSpeech is speech transcription service that runs locally using machine learning. The model they released is trained by way of Mozilla’s Common Voice Project, essentially crowd sourcing the training for their model.

Mozilla states that a Raspberry Pi and/or mobile isn’t in the cards yet (unless you’d like to fork the open source project and figure it out yourself), but it is on their roadmap. I’m guessing that to make it more mobile ready, the model and associated data files will need to be cut down from the 2GB that it is.

I did have some trouble getting started, but I’ll walk you through and show some results. Coming off of trying to get other ML libraries installed, this was a walk in the park and extremely straightforward. But, like i said, it’s new and I had a bit of trouble.

First of all – I had Python 3 installed. Nope. Get yourself Python 2. It’ll probably work someday on 3, but not today. Next their instructions to get started are super easy – run the Python package manager: PIP and do “pip install deepspeech”.

Unfortunately, PIP couldn’t find the package! Turns out Mozilla doesn’t offer the package for Windows yet, and in fact looking over the docs Windows might not really be tested or supported at all. With my Mac at work, I figured I was out of luck – but then remembered that Windows 10 comes with Ubuntu now! Even though I was giving it a shot, I thought it’d be futile.

Nope, worked like a charm! DeepSpeech installed quickly and easily. Next, I wanted to jump right in and give it a go. On their README, they list the command:

deepspeech output_model.pb my_audio_file.wav alphabet.txt lm.binary trie

This begs the question of….Where are those files? That model file, the binary, the txt? Not at all obvious from the README, but you can find the on the Releases part of their repo.

Once I had these in place, my first attempt threw an error. It was vague….something about having 2 dimensions.

TypeError: Array must have 1 dimensions.  Given array has 2 dimensions

All it meant was that it doesn’t support stereo WAV files, just mono ones. Somehow dimensions == tracks.

I used a YouTube downloader site to grab a few samples, then had them converted with FFMPEG. On a couple occasions, I used Adobe Audition to chop things shorter so things would only be a few seconds. You’ve got to be very careful here because your result can range from audio processing errors in your console or garbled nonsensical output!

Some tips:

  • Use 16 bit, 16khz, Mono Audio
  • Make sure to not include metadata in the file (Adobe Audition defaults on, but in the export settings you can uncheck the box for “markers and metadata”)
  • Expect a bit over double the processing time for the duration of the clip

My very first try was a random example WAV file I found online:

It was pretty good! The result was “A tree has given us a you net opportunity to defend freedom and were going to seize the moment and do it”. The mistakes were “a tree” instead of “history” and “you net” instead of “unique”. Honestly, I wonder if these methods would exist if we applied some Natural Language Processing as a filter like the cloud services do…and since we run it local, we can easily insert this step and many others. It took 10 seconds to process this 4 second audio file.

Now the real test, a two minute clip from Batman. Again, I ran this video through a downloader service. It saved to WAV, but I had to run it through Audition to make sure the bit rate and sample rate were correct.

The output was impressive, but there were long garbled stretches:

“o dascmissiur mister freeze wants what hello ill see if i can get a chief o here a moment commissoerdutchepfoherothisisbrusewiyistfreezeontswhatcommissionergardenisonlaealo with bat man mister wan and perhaps if we put the two force together and you could talk to him yourself all right chief i dont have much time oh that man yes mister wine i you heard mister freesesscirless demands just briefly if raidand i have is gobetweensare you prepared to make the telocacacminnightandpaytherensomisterwayei have no choice bad men that may i suggest you take the broadcaster the commissioners office in an hour earlier and we will have a dome package money a to me ackageyoumoney this sons risky risk is our business mister wine of course but an i have the same faginkyou that all of gottemcityaskihoperobinandi are deserving of that faith ill make the necessary arrangements a meetyouwithtaeconmster’s office at eleven in it you can altakrmisuerindeed i did that man will usunup take telltecastanlavethedammypackageemoneywaitingsewateleventoliktohsofindmensodissimilaringbatyrisbenganyooensosimilaranoherh”

What’s weird is that these garbled stretches look almost correct if they were spaced out.

So yes, it has a little way to go, but it’s impressive for launch. It’ll will also only get better as Mozilla and the community improve the models, maybe create some NLP wrappers (or otherwise), and shrink it down for mobile. Congrats Mozilla, I’m impressed – this project is needed!

The Slow March of Web Component Progress

Almost two years ago, I made a hefty series of posts on the promise of Web Components. Things have changed and promises were broken, but on the whole, I don’t think MUCH has changed from an implementation perspective. These days, I’ve been sucked into the awesome world of the 3D web and WebVR and soon WebAR, but often I need some 2D UI around my 3D scene/canvas. When I do, it’s STILL all Web Component based HTML, CSS, and pure vanilla Javascript.

API Changes

You’d think the biggest change might be version 1 of the Web Components API, but actually not much has changed from an implementation perspective. Really, some method names have changed, but the API is pretty much accomplishing the same thing.

Here’s version 0:

class MyCustomClass extends HTMLElement {
    // Fires when an instance was removed from the document.
    detachedCallback() {};

    // Fires when an attribute was added, removed, or updated.
    attributeChangedCallback(attr, oldVal, newVal) {};
    
    // Fires when an instance was inserted into the document.
    attachedCallback() {};

    // Fires when an instance of the element is created.
    createdCallback() {};
}

Now, compare that to version 1:

class MyCustomClass extends HTMLElement {
    static get observedAttributes() { return [] }
    constructor() {
        super();
    }
    // Fires when an instance of the element is created.
    connectedCallback() {}

    // Fires when an instance was removed from the document.
    disconnectedCallback() {}

    // Fires when an attribute was added, removed, or updated.
    attributeChangedCallback(attributeName, oldValue, newValue, namespace) {}
   
    // Fires when an instance was inserted into the document.
    adoptedCallback(oldDocument, newDocument) {}
}

So pay attention, here…what actually changed? The method names, for sure, but once you change the method names, the use is exactly the same. Bonus, we have a constructor! We didn’t before, and its just plain nice to have something here to use as a callback when this component is first instantiated. Prior to this everything needed to be done when the element component is created or attached to the document. To be fair, component creation vs class instantiation seems essentially the same from a usage standpoint, but it WAS weird not being able to have a constructor on a class in version zero.

Another small change is the observedAttributes getter. Previously in version zero, the attributeChangedCallback handler worked on any attribute of your component. Changing <my-component someattribute=”hi”></my-component> to <my-component someattribute=”bye”></my-component> at runtime would trigger this handler and allow you to take action. Now, though, a developer needs to be more deliberate. If your code needs to watch for these changes from “someattribute”, this value needs to be added to the observedAttributes:

static get observedAttributes() { return ['someattribute'] }

Sure, it’s something extra to do, and yes, before I knew what this did, I spent several minutes trying to figure out why my attribute change method wasn’t being called, but it’s pretty minor and requires more deliberate intention. I can’t really complain, the change seems good overall.

From a class implementation perspective, this is all that changed! There is one other change outside the class, though. It used to be that the class would be attached to the HTML tag like this:

document.registerElement('my-component', MyCustomClass)

Now, in v1, it’s done like this:

customElements.define('my-component', MyCustomClass);

Unfortunately, while Chrome, Safari, and Opera support “customElements”, Firefox and Edge do not yet. Given that Firefox is listed as “under development”, and in Edge it’s “under consideration”, I’m OK with this. We’ll get there, but in the meantime, a polyfill works.

Undelivered promises

One of the biggest points of excitement for Web Components for me was the elegance of working with three separate things in combination to create a component: Javascript, CSS, and HTML. If you asked me 2 years ago what the biggest risk to this vision was, it was getting browsers to implement the Shadow DOM. To remind you, the Shadow DOM was a protective wall around your component. Components could have their own CSS associated with them, and this Shadow DOM protected CSS rules from the outside seeping in and wrecking your rules. Likewise, your components internal DOM couldn’t be manipulated from the outside.

Unfortunately, browsers were slow to adopt this, and even worse, it was harder to polyfill. The Polymer project even invented this notion of a “Shady DOM”. Given this confusion, churn, and uncertainty, I never really adopted using the Shadow DOM. In all honestly, I personally don’t really need it. I can see bigger applications and teams using it as a layer of protection against themselves like how other languages might use private/protected/public variables in their classes as a way of allowing team members to use and call on only what’s been exposed.

But this is the web! When this layer of protection isn’t offered to us, we just use conventions instead. Biggest and easiest convention is to just never tweak component DOM from the outside. If you need to do something like this, you’re doing it wrong…just make a method as part of your component’s API to do what you need.

CSS is a bit trickier, but we’ve had the tools we’ve needed since the earliest days of CSS. Instead of relying on the Shadow DOM to stem off outsiders from mucking with your component’s style, simply namespace every single CSS rule relating to your component with the component’s name like so:

my-component .an-Inner-Class {
  background-color: black;
}

All that said, it appears there is a new version of the Shadow DOM shaping up. I haven’t followed the latest here at all, but I think I might wait until there’s a strong indication things will settle down before I bother with it.

Given than the Shadow DOM, for me, is so easy to ignore until I have more confidence, I’m not really bothered. What I AM bothered by is how “HTML Imports” have dropped from favor. To be fair, we’ve always been able to polyfill HTML Imports fairly easily. At the same time, though, when Webkit/Safari has no interest and Firefox has no plans to implement, the whole notion seems dead in the water. I’ve seen some conversation that the web community doesn’t want to adopt HTML Imports in favor of the Javascript “import” mechanism, but I’m not aware that this works in a meaningful way yet for HTML, nor is “import” supported in any browser except the most recent version of Chrome and Safari.

This leaves us with a bit of a challenge. I really don’t want to create my component’s DOM entirely with code – every single tag created with “document.createElement(‘div’)” and then assigning classes, innerText, and then appending the child to a parent.

Fortunately, I’ve found that for me at least, inlining HTML into my Javascript is not as bad as I thought it might be. Components themselves should be fairly small – if you want major complexity, you may want to architect your big component into smaller ones that work together. Therefore, the HTML that you inline shouldn’t be that complicated either. By convention, I can also use the constructor for my component as a decent place to put my HTML, because there isn’t much else I need to add here.


    constructor() {
        super();
        this.template = '
            <h4>Objects\<select class="fileselector">\
                <option value="default">box primitive</option>\
                </select>\
            </h4>\
            <ul></ul>'; 
    }  

    connectedCallback() { this.innerHTML = this.template; }

The above component represents a simple list (ul tag) which has a header above containing some text and a file selection menu. Honestly, the example I pulled isn’t the prettiest thing in the world right now, and once I flesh out this simple component, I expect to have double or triple the lines of HTML potentially. But, all the same, it’s pretty manageable to inline this. It also introduces a couple simple things in the way I format my HTML. I properly indent and new-line everything here just like you would see it in an HTML document. The mechanism to accomplish this readability is simply with a backslash after every continuing line.

I’ve also been exposed to the concept of backticks: `. Backticks are another way to wrap your strings in Javascript that allow you to inject variables. This is more commonly known as “template literals”. It’s not a new concept by far. Though I haven’t really done anything with string templating in the past, I believe the practice is extremely common in React, Backbone, and Underscore. I haven’t favored the usage of this for HTML because I like to keep my markup and JS separate, but I think I’m caving now to get a decent flow for components.

One problem with templated HTML in this case, though. It’s easy enough to inject a var like so:


   var inject = 'hi';
   var template = `<div>${inject}</div>`;

The problem is that in the simple example above, the “inject” variable is in the same scope as the template! Typically when I want to use this type of pattern, I prefer to store the template as a sort of variable I can access from elsewhere rather than having it inside my code logic when I’m constructing these elements.

Here’s a fake example to explain:


for (let c = 0; c < data.length; c++) {
   let myitem = document.createElement('li');
   myitem.innerHTML = `<div>${data[c]}</div>`;
   mylist.appendChild(myitem);
}

In this example, I’m appending new list items (li elements) to a unordered list (ul element). Right inside my loop here, I’m declaring what my template looks like. Personally, I think this is some bad code smell! Ideally, I want to break out any HTML I have into a separate variable so that if I AM going to inline my HTML (which I still think is kinda smelly on its own), I should at least have it separated out so I can easily track it down and change it. Putting it inside my application logic especially inside a loop like this just feels terrible.

Unfortunately, it’s not possible to save a template with literal like this as a variable. Instead, we can create a method that accommodates both this and the creation of the element:


    itemTemplate(data) {
        var template = document.createElement('template');
        template.innerHTML = `<li class="mesh">${data}</li>`;
        return template.content.firstChild;
    }

I use the “template” tag here so I don’t have to decide upfront which type of tag to create, and my tag (including the outer tag) can live entirely in this template string. Otherwise, for my outer tag I’d also have to have additional JS calls to set any attributes, classes, or IDs on it.

Custom Events

Custom events haven’t changed, but there’s a little trick I like to use that’s worth mentioning. Here’s the creation and triggering of a custom event:


        let ce = new CustomEvent('onCustomThing', { detail: { data: data }});
        this.dispatchEvent(ce);

The above code is pretty simple, but there is one thing I don’t like about it, and that is the string ‘onCustomThing’. If you think about it, whoever consumes this event outside this class needs to spell ‘onCustomThing’ correctly AND use the correct capitalization. If we change this over the course of our project, we could break things and not know it.

That’s why I like to assign a sort of a static constant to the web component class. In practice I haven’t been using any JS language features that dictate it is a static constant (though I probably could copying how observedAttributes is declared). Here’s how I do it:


MyComponent extends HTMLElement {
    ...
    disconnectedCallback() {}
    attributeChangedCallback(attributeName, oldValue, newValue, namespace) {}
    adoptedCallback(oldDocument, newDocument) {}
}
MyComponent.CUSTOM_THING = 'onCustomThing';
customElements.define('my-component', MyComponent);

So now, elsewhere, I can listen for the event like so:
mycomponent.addEventListener(MyComponent.CUSTOM_THING, e => this.onCustomThing(e));
Yesssssss, you could bungle the syntax here as well making it as bad as a string, but it’s easier for an IDE to refactor and predictively type as you code.

What’s missing

This last bullet point of what’s missing is a minor one, and I think it’s slowly being corrected. Web Components aside, I’ve been developing most of my projects using Javascript modules by way of the “import” command. Chrome’s latest version supports it, though I haven’t properly tried it out yet. Instead, I’ve been relying on the “browser-es-module-loader” polyfill. It works amazingly well and I use it as a way to give my application a class based “controller” that can import modules as it needs to.

So you can import a “main entry point” Javascript file as a controller, and anything downstream can also import modules. It’s pretty awesome, but any Web Components you use in your application are NOT downstream of this controller and as a result cannot use imports. I haven’t put in any serious brainpower to overcome this, but instead when I run into this issue, I take it as a hint that my component could be getting a bit too complex, and I work around it. Honestly, though, once this polyfill is not needed anymore, I’ll be happy!

Final Thoughts

As a whole, I’m still happy with writing web components after 2 years. I still have no desire to change. I think things are getting better and better, just a bit more slowly than I originally anticipated. I’m also a bit surprised at HTML imports being on its last legs. As a workflow and architecture, I still think it holds up really well, even if we have to shuffle around some of the pieces that make it up.

Everybody is different, though, and there are many different tools for many different jobs. I still haven’t touched React or Angular 2-4 yet. I’m happy, but if you use those frameworks, you might be perfectly happy too! Consider this another tool to add to your belt (without all the bells and whistles of course).

A-Bad attitude about A-Frame (I was wrong)

I haven’t written many posts lately, especially tech related. The reason why is that I’ve been chasing the mixed reality train and learning. 3D development in general has had a number of false starts with me, and I never went in the deep end until now. This past year or so, I’ve been using both Unity and WebVR.

My failed 3D career

Prior to this, I went through a Shockwave 3D phase in the early 2000’s. People forgot about that REALLLLLLL quick. Likewise when I got all excited about Flash’s CPU based 3D engine Papervision (not Macromedia made, but for Flash), I remember learning it and then the hype died down to 0 within a few months. And then of course, there was Adobe Flash’s Stage3D. But as you might recall, that was about the time that Steve Jobs took the wind out of Flash’s sails and it was knocked down several pegs in the public eyes.

Whatever your opinion on Director or Flash, it doesn’t matter. Approachable 3D development never felt like it had a fair shot (sorry, OpenGL/C++ just isn’t approachable to me or lots of people). In my opinion, there were two prime reasons for this. The first is that GPU’s are really just now standard on everyone’s machine. I remember having to worry about how spectacularly crappy things would look with CPU rendering as a fallback. Secondly, and this is huge: visual tooling.

Don’t believe me? I think the huge success of Unity proves my point. Yes, C# is approachable, but also, being able to place objects in a 3D viewport and wire up behaviors via a visual property panel is huge. Maybe seasoned developers will eventually lay of this approach, but it introduces a learning curve that isn’t a 50ft high rock face in front of you.

Unity is fantastic, for sure, but after being a Flash developer in 2010 and my career seemed to be crumbling around me, I’m not going to be super quick to sign up for the same situation with a different company.

Hello, WebVR

So, enter WebVR. It’s Javascript and WebGL based, so I can continue being a web developer and using existing skills. It’s also bleeding edge, so it’s not like I have to worry about IE (hint: VR will never work on IE, though Edge’s Mixed Reality support  is now the only publicly released version of WebVR (FF and Chrome WebVR are in experimental builds)! Point being, is that all those new ES6 features I dig, I can use them freely without worrying about polyfills (I do polyfill for import, though….but that’s another article for another time).

Those of us who were excited about WebVR early on, probably used Three.js with some extensions. As excitement picked up steam, folks started packaging up Three.js with polyfills to support everything from regular desktop mouse interaction, to Google Cardboard, to the Oculus Rift and Vive, all with the same experience with little effort from the developer.

I found that object oriented development with ES6 classes driving Three.js made a lot of sense. If you take a peek at any of the examples in Three.js, they are short, but the code is kind of an unorganized mess. This is certainly forgivable for small examples, but not for big efforts that I might want to try.

So, I was pretty happy here for a while. Having a nice workflow that you ironed out doesn’t necessarily make you the most receptive to even newer ways, especially those that are super early and rough around the edges.

Enter A-Frame

I believe early last year (2016), when I was attending some meetups and conference sessions for WebVR and Mozilla made a splash with A-Frame. Showing such an early release of A-Frame was a double edged sword. On the plus side, Mozilla was showing leadership in the WebVR space and getting web devs and designers interested in the promises of approachable, tag based 3D and VR. The down side was that people like me who were ALREADY interested in WebVR and already had a decent approach for prototyping were shown an alpha quality release with a barely functional inspection and visual editing tool that didn’t seem to offer anything better than the Three.js editor.

As I wasn’t excited about it at all, I also reasoned that the whole premise of A-Frame was silly. Why would a sane person find value in HTML tags for 3D elements?

Months passed, and I was happy doing my own thing without A-Frame. I even made a little prototyping library based on existing WebVR polyfills with an ES6 class based approach for 3D object and lifecycle management. It was fairly lightweight, but it worked for a couple prototypes I was working on.

A-Frame Round 2

Right around when A-Frame released 0.4 or 0.5, the San Francisco HTML5 meetup group invited them on for another session in another WebVR event. The A-Frame community had grown. There were a crazy number of components that their community built because…hey A-Frame is extensible now (I didn’t know that!). The A-Frame visual inspector/editor is now the really nice and accessible as a debug tool from any A-Frame scene as you develop it. Based on the community momentum alone, I knew I had to take a second look.

To overcome my bad A-Frame attitude, I carved out a weekend to accomplish two goals:

  • Reason an organized and scalable workflow that doesn’t look like something someone did in 2007 with jQuery
  • Have a workflow where tags are completely optional

I actually thought these might be unreasonable goals and I was just going to prove failure.

A-Scene

As I mentioned briefly, I had my own library I was using for prototyping. Like I said it was basically a package of some polyfills that had already been created for WebVR with some nice ES6 class based organization around it.

I knew that A-Frame was built much the same way – on top of Three.js with the same polyfills (though slightly modified). What I didn’t count on was that our approach to everything was so similar that it took me just a few hours to completely swap my entire foundational scene out for their <a-scene> tag, and…. it…worked.

This blew my mind, because I had my own 3D objects and groups created with Three.js and the only tag I put on my HTML page was that <a-scene> tag.

Actually, there were a few hiccups along the way, but given that I was shoving what I thought was a square peg into a round hole, two minor code changes are nothing.

My approach is like so:

Have a “BaseApplication” ES6 class. This class would be extended for your application. It used to be that I’d create the underlying Three.js scene here in the class, but with A-Frame, I simply pass the <a-scene> element to the constructor and go from there. One important application or 3D object lifecycle event is to get a render call every frame so you can take action and do animation, interaction, etc. Prior to A-Frame, I just picked this event up from Three.js.

Like I said, two hiccups. First, my application wasn’t rendering it’s children and I didn’t know how to pickup the render event every frame. Easy. First pretend it’s an element by assigning an “el” property to the class and set it to playing:

this.el = { isPlaying: true };

Next, simply register this class with the A-Frame scene behaviors like this:

this._ascene.addBehavior(this);

Once this behavior is added, if your class has a “tick” method, it will be fired:

/**
* a-frame tick
* @param time
*/
tick(time) {
...
}

Likewise, any objects you add to the scene, whom you want to have these tick methods, simply add them to the behavior system in the same way.

In the end my hefty BaseApplication.js class that instantiated a 3D scene, plugins, and polyfills, was chopped down to something 50 lines long (and I DO use block comments)

export default class BaseApplication {
    constructor(ascene, cfg) {
        if (!cfg) {
            cfg = {};
        }
        this._ascene = ascene;
        this._ascene.appConfig = cfg;
        this._ascene.addBehavior(this);
        this.el = { isPlaying: true };
        this.onCreate(ascene);
    }

    get config() {
        return this._ascene.appConfig;
    }

    /**
     * a-frame tick
     * @param time
     */
    tick(time) {
        this.onRender(time)
    }

    /**
     * add objects to scene
     * @param grouplist
     */
    add(grouplist) {
        if (grouplist.length === undefined) {
            grouplist = [grouplist];
        }
        for (var c in grouplist) {
            grouplist[c].addedToScene(this._ascene);

            if (grouplist[c].group) {
                this._ascene.appendChild(grouplist[c].group);
                this._ascene.addBehavior(grouplist[c]);
            } else {
                this._ascene.appendChild(grouplist[c]);
            }
        }
    }
 // meant to be overridden with your app
 onCreate(ascene) {}
 onRender(time) {}
}

As you might be able to tell, the only verbose part is the method to add children where I determine what kind of children they are: A-Frame elements, or my custom ES6 class based Object Groups.

How I learned to Love Markup

So, at this point I said to myself…”Great! I still really think markup is silly, but A-Frame has a team of people that will keep up with WebVR and will update the basics as browsers and the spec evolves, and I should just use their <a-scene> and ignore most everything else.

Then, I hit ctrl-alt-i.

For those that don’t know, this loads the A-Frame visual inspector and editor. Though, of course, it won’t save your changes into your code. Let me say first, the inspector got reallllllly nice and is imperative for debugging your scene. The A-Frame team is forging ahead with more amazing features like recording your interactions in VR so you can replay them at your desk and do development without constantly running around.

So, when I loaded that inspector for the first time, I was disappointed that I didn’t see any of my objects. I can’t fault A-Frame for this, I completely bypassed their tags.

That alone roped me in. We have this perfectly nice visual inspector, and I’m not going to deny my use of it because I can’t be convinced to write some HTML.

Using Tags with Code

At this point, me and A-Frame are BFF’s. But I still want to avoid a 2008 era jQuery mess. Well, turns out, 3D object instantiation is about as easy in A-Frame as it is with code. It’s easier actually because tags are concise, where as instantiating materials, primitives, textures, etc, can get pretty verbose.

My whole perspective has been flipped now.

  • Step 1: Create that element, just set that innerHTML to whatever or createElement and set attributes individually
  • Step 2: appendChild to the scene (or another A-Frame element)

That’s it. I’m actually amazed how responsive the 3D scene is for appending and removing elements. There’s no “refresh” call, nothing. It just works.

I actually created a little utility method to parse JSON into an element that you could append to your scene:

sceneElement.appendChild(AFrameGroup.utils.createNode('a-entity', {
    'scale': '3 3 3',
    'obj_model': 'obj: ./assets/flag/model.obj; mtl: ./assets/flag/materials.mtl',
    'position': '0 -13 0'
}));

AFrameGroup.utils.createNode(tagname, attributes) {
    var el = document.createElement(tagname);
    for (var c in attributes) {
        var key = c.replace(/_/g, '-'); // hyphens not cool in JSON, use underscore, and we convert to hyphen here
        el.setAttribute(key, attributes[c]);
    }
    return el;
}

Yah, there’s some stuff I don’t like all that much, like if I want to change the position of an object, I have to go through element.setAttribute(‘position’, ‘0 50 0′). Seems bit verbose, but I’ll take it.

A-Happy Prototyper

Overall, the markup aspect, early versions, and lack of organization/cleanliness in the code examples made me sad. But examples are just examples. I can’t fault them for highlighting simple examples that don’t scale well as an application when they intend to showcase quick experiences. A-Frame wants to be approachable, and if I yammer on to people about my ES6 class based approach with extendable BaseApplication and BaseGroup/Object classes, yes, I might appeal to some folks, but the real draw of A-Frame right now is for newcomers to fall in love with markup that easily gets them running and experience their own VR creations.

All that said, I did want to share my experience for the more seasoned web dev crowd because if you peel back the layers in A-Frame you’ll find an extremely well made library that proves to be very flexible for however you might choose to use it.

I’m not sure whether I want to link you folks to my library that’s in progress yet, but I’ll do it anyway. It’s majorly in flux, and I keep tearing out stuff as I see better ways in A-Frame to do things. But it’s helping me prototype and create clear distinction of helper code from my prototype logic, so maybe it’ll help you (just don’t rely on it!)

 

 

Back From VRLA

I believe it was during a session called “Shooting VR for Post” that I found myself identifying heavily with one of the panelists who said something to the effect of “Before VR, my work was a bit mundane. We’d take a look at a shot we needed to do in a meeting, and we wouldn’t even have to talk, we’d instantly know what our roles were and break to get down to work. With VR now, it’s not that easy, we need to knock our heads against the wall and really come up with ways to get the job done.”

As a web developer, I share this sentiment completely. The speaker expounded, giving an example like when Houdini comes out with a new node (I can only vaguely guess what this means), there’s a level of excitement, but it’s short lived. I feel similarly when a new Web API or Node.js based front-end workflow enhancement comes out, or a new framework is released. It changes our workflow in a nifty way, but it doesn’t necessarily change the work we create in a meaningful way.

It’s a big sentiment, and I feel it’s absolutely monumental that I happen to share this sentiment about the same new technology with a cinematographer…someone whom I might never even speak to in a professional capacity. I also seem to share this sentiment with sound engineers, game developers, VFX artists, hardware manufacturers, and more. I even had a fascinating conversation about depth being registered in your hypothalamus vs your visual cortex with a Game Developer/Designer/Cognitive Psychologist.

I’m silo-ing people a bit here because the more curious amongst us (including myself) have always enjoyed exploring the fringes of our craft. It’s not necessarily true that I wouldn’t talk to a cinematographer as a web developer, but it’s also not necessarily normal.

The point is that VR is bringing the best minds from all disciplines together and dissolving the fringes between these disciplines. Conferences like VRLA allow the stories of these boundaries breaking down to be told.

IMG_20170415_131157

This is incredibly important, not only for getting acquainted with what skills are being injected into this new medium and why, but also because nobody knows the right way to do things. When there’s no right way to do things, there’s no book you can buy, nothing to Google, nothing we can do except hear about fleeting experiences from people that got their hands dirty. We need to hear about their pain and about their opinions formed from creating something new and unique. When we hear lots of such perspectives, we can assemble a big picture, which I’m sure will be shattered by the next VRLA. I’ll be waiting to learn about the hypothetical magician a panelist cited as a great collaborator for focusing attention in a 360-degree world.

Also interesting is the regionality of VR creators. I feel like I hear an entirely different story in San Francisco versus what I heard at VRLA. When I attend the (admittedly low number of, so far) meetups around the Bay Area, it’s mostly about hardware, platforms, new app ideas, prototypes, social experiences. In LA, I feel that it was overwhelmingly VFX, cinematography, sound design…a very heavy focus on well-produced content. I’m still uncertain about the regionality around game development, perhaps because it’s relatively regionless. Though, one memorable paraphrased line on that subject was “Game devs are now sitting in the same room as VFX artists and directors.”

Perhaps one of the more interesting things I picked up was the different stories from different creators on immersive video. Immersive or 360 video seems like a mainstay in VR. The cries of it not really being VR have been sufficiently drowned out with most, if not all, presenters acknowledging the sentiment but disagreeing with it. Andrew Schwarz of Radiant Images, for example, called immersive video the “killer app” of VR. I expected this sentiment, especially in a city with so much film talent.

Andrew Schwarz of Radiant Images showing the new completely modular camera mount (AXA 360 Camera System) for immersive media
Andrew Schwarz of Radiant Images showing the new completely modular camera mount (AXA 360 Camera System) for immersive media

What I did not expect was the nuance verging on disagreement from Dario Raciti of OMD Zero Code. His point of view seemed to be that the novelty of immersive video has waned. His interest lies in creating marketing campaigns that make brands like Nissan and Gatorade stand out from the rest. Answering my question of what kinds of projects he tries to sell to clients, he flat out says he tries to discourage pure 360 video. Instead, he prefers a more immersive experience mixed with 360 video.

An excellent example of this was his “Let Hawaii Happen” piece. The user begins on a parachute they can steer and navigate to various islands in Hawaii. Once they’ve landed, it switches to a non-interactive 360 video tour.

I think Dario’s take on advertising with VR is very much worth listening to. His team also created a car-shopping VR experience for Nissan in which the user is seated to get a feel for the interior of the car, much like what you would do car shopping in reality. Outside the windows, however, a much different scene plays out: the viewer is also part of a battle in the Star Wars universe.

That exemplifies Dario’s notion of mixing real-time 3D content with immersive video, but it also touches on his point about advertising in general. To liberally paraphrase, Dario feels you should never beat the user over the head with branding. No logos, no mentioning of the brand unless its subtle and integrated into the experience. The experience always comes first, and if it’s memorable, it will sell the brand.

To me, this speaks to the larger issue of taking concepts we already employ en masse in traditional media and shoe-horning them into VR. Advertisers, I know you’re already thinking of this. You want to cut to commercial, put your logo on the bottom third of the screen, and include voice overs about how your brand is the best. Dario is saying to create good marketing experiences, let the content flow freely and be subtle about your brand. Consumers will respond better. He even cited “Pearl,” an Oscar-nominated VR short, as an example of something that could be a commercial with extremely limited changes.

The notion of shoe-horning brings another memorable notion to mind. To date, I’ve been thinking about VR like the jump from desktop to mobile. But the better analogy from one panelist was that “VR is like the jump from print to digital.” While stubbornness to hold on to the old ways can be detrimental, years of experience coupled with open-mindedness can be a huge asset.

In the Cinematographers’ panel, it was mentioned that old 3D tricks, because of limited processing power, are now coming back into fashion. The reason being that game engines like Unreal are coming into favor for doing real-time previews of scenes. Even traditional film equipment is being recreated in VR to help production. To hear a cinematographer talk about replicating a camera crane in VR and then shrinking it down, scaling it up, putting it on a mountain-top…. all within a day’s shoot was incredibly interesting.

IMG_20170414_173154
Shooting VR for Post Panel

The panelists and presenters at VRLA shared so much of their recent, and super fascinating, experiences based on their experimentation. This was a bit unfortunate, because I found myself glued to the presentation rooms and out of the expo floor. I saved my 2-hour lap through the expo hall until the very end. As expected, the lines for the more interesting experiences were either too long or closed. I can’t fault VRLA or their exhibitors for this; it seems a standard downside of VR conferences. I would wager that the most popular experience was the Augmented Reality (Hololens) Easter Egg hunt. As I didn’t experience it, I’ll just leave you with a photo because it looks awesome.

Microsoft Hololens Augmented Reality Easter Egg Hunt
Microsoft Hololens Augmented Reality Easter Egg Hunt

Of course, like Microsoft, a bunch of big vendors were there: Facebook, HTC, Intel. Although I don’t own a Vive, their talk of the multi-platform subscription service and their wireless headset was exciting. So was hearing how dedicated Intel, HTC, and HP are to VR developers. Yes, Facebook and MS are dedicated to Mixed Reality as well, but for me, that message was well received a while ago, so it’s awesome to see the pile on.

Being that there were around 170 exhibitors at VRLA, there were tons of smaller ones showing games, hardware, new experiences, and new creative tools. One notable company, Mindshow (http://mindshow.com), offers creative tools for recording animated characters with your body and voice in real-time. Watching from the expo floor, I was a bit disappointed as it felt too scripted. However, a booth attendant assured me it was that way for the 10-minute, quick demo for conference go-ers. It makes sense that you’d probably not want to start users with a blank slate if you only have a short window to impress them. So, if Mindshow is what I think it is, I can imagine having so much fun myself, and I can see many people creating awesome animated content extremely easily….but I’ve been known to overhype things in my own head.

Though it was my first time, VRLA has been going on for 3 years now and they’ve grown exponentially. The conference-going experience was not as seamless as others I’ve been to. The Friday keynote was delayed by at least 30 minutes because the speaker had no slide notes, which set off a cascade of presentation time pushbacks. There were constant audio issues, and the light field talk I was really looking forward to was cancelled with no explanation. This is all forgivable and probably par for the course given how many people from different disciplines are coming in and bringing their passions and experiences. There’s an amazing energy in VR. Organizations and conferences like VRLA focus it. It might not be laserlike as VR grows exponentially, but with a medium so young and with so many stories still to be told from creators about their experimentation, everything is appreciated.

360 Video: An Afternoon Within

With the Oculus store continuing to get some interesting titles while I wait with baited breath over getting my very own Oculus Touch Controllers when they get released, its easy to forget about 360 Video. Some say 360 Video is not VR at all like Vive engineer Alan Yates:

notvr

The problem Mr. Yates seems to have is that 360 Video chops off some key VR immersion factors like being able to move your body (not just your head) as well as being able to interact with the scene and have the scene interact with you.

In addition to lacking some key things that make VR the immersive thing it is when you think about content like games, it can also box in the content creators to doing things that are less than ideal in VR. A big one here is knowing that when in VR, an experience can induce a little motion sickness by moving the user against their will. Another is dropping a user into a space leaving them disoriented and confused while they figure out where they are. 360 Video continues to suffer from this as traditional video creators make the choice to pan their cameras, or don’t pay enough attention to their viewer when cutting between scenes.

All that said, whether it is or isn’t VR, it’s certainly an emerging art form. It’s enough like traditional video to be approachable for longtime video creators, but also deceptive in that these same creators need to question fundamental techniques or risk creating bad or even vomit-inducing experiences.

It had been a while since I last spent a couple hours enjoying 360 Video, so yesterday afternoon I decided to go back in. Things have improved, and there are some fascinating shorts being produced as well as not so fascinating shorts with gimmicks that may or may not work. Let’s talk about them.

Mr. Robot – Written and Directed by Sam Esmail

Runtime: 13:03

Filesize: 586MB

Kicking things off with Mr. Robot is a bit unfair. Not only do I love the TV show, but of the shorts I’ve seen so far I like this one the best. It uses a number of gimmicks that don’t feel gimmicky, and breaks some other rules in a way that feels OK.

Also interesting, is that many of the best videos that I’ve seen and I want to talk about are directed or co-directed by Chris Milk who runs a VR Storytelling company called Within (and then posted on the Within Oculus channel). Despite Mr. Milk making some compelling shorts, Mr. Robot shines greater than any of them for me AND it is directed by original Mr. Robot creator Sam Esmail.

BTW…only read the rest of this one if you can handle Season 1 spoilers.

Elliot, the main character, turns out to have some major split personality disorder. Through season 1, Elliot’s routinely expresses his thoughts to you, the viewer. It would be a bit gimmicky breaking the fourth wall like this all the time, except when you consider the possibility that the viewer is another personality or just a remnant of Elliot’s troubled mind.

elliotlookingatyou
Elliot acknowledging the user

The 360 short uses this setup to it’s advantage, and as you might expect it just so happens to be perfectly suited for VR. You enter the experience next to Elliot on his couch while listening to his internal thoughts (expressed as a voice over) lamenting about some past memories. In true Mr. Robot fashion, Elliot looks at you to acknowledge your presence occasionally. Turns out this works great for VR, too. The user having a presence (even if just an inactionable one) does great for immersion.

An interesting thing that happens early on is that the director wants to focus on the laptop in the kitchen. It’s a bit weird, in that it feels like a throwaway story point that never really matters. That said, with traditional video, a director might edit the shot and cut over to the laptop. However, with 360 video we can’t have hard edits like this that disorient the viewer, so instead the lights flicker in the kitchen around the laptop and the user’s attention is drawn there.

Elliot also happens to be smoking a joint which presents some interesting opportunity for gimmick. Elliot takes a big puff and exhales at the camera which offers an opportunity to transition to a past memory. While this isn’t necessarily a 360 video gimmick, what follows is him sitting in the exact same spot in his past memory. In fact the scene looks no different, which is an important point not to disorient the user’s experience. The use of whiting out the scene with smoke serves to transition the story but not necessarily the set.

The marijuana use also provides a convenient way for the camera/viewer to get “high”. As the marijuana starts taking effect, the camera starts floating to the ceiling offering a wider view of the shot and allowing Elliot’s past love interest to enter. He even calls out “Oh wow, I got really high…look at us up here”.  Very important to reiterate here that Esmail coupled the camera transition with talking to the user about it in while SIMULTANEOUSLY pretending its part of the story.

elliogettinghigh
Camera floats up as Elliot gets high

To further layer on this storytelling feat, the camera view slightly shifts the user to look at the door in preparation for his former love interest Shayla to walk in.

As Shayla knocks on the door, something a bit awkward happens which is that with each knock, the camera cuts slightly in every direction a few times. It feels like a mistake, but perhaps it was an opportunity to cover up a production mistake where the shots weren’t aligned.

Shayla enters and wanders a bit around the room talking to Elliot. As she enters the bedroom, a light turns on and illuminates the room for a moment. To me it was a bit awkward and I couldn’t find any purpose to it, but its over quickly.

As she wanders around the camera pans a bit, which is breaking a bit of a VR “rule” since you have no control over it – but it’s done gently and after the initial camera floatation and other movements so doesn’t feel wrong in the least. Here the full 360 comes into effect as Elliot stays on one side of you and Shayla walks behind you. You, the viewer are in the middle of this conversation and it feels like being in the middle of a group, turning your head each way to see each person speak.

shaylaandelliot
Shayla and Elliot talking

 

After the two leave there are some establishing shots of the beach, the shut down amusement park that would be later used as a base of operation. In these shots there is some gentle movement. Again, it seems to break a bit of the VR rule of not forcing the user to move in a way they aren’t actually moving their body – but here it feels right and I think making it a short establishing shot that’s not integral to on-screen dialog is the right way to go.

More of the story happens when the video cuts to inside a ferris wheel car. As the video goes on, Esmail seems to be limiting the storytelling to slow paced enclosed areas with the dialog being a bit slow paced as well – more like what you’d find in real life, not fast moving heated dialog with fast cuts. Again, in the Ferris Wheel scene, you must turn to each character as they talk, much like you would do as the third and silent wheel in a real conversation sitting behind two people.

ferriswheel
Shayla and Elliot in the Ferris Wheel

What’s interesting here, is that I did previously watch this 360 video on a monitor using my mouse to pan around. I thought it was a bit boring, and I didn’t make it past the opening title, judging it as another property to jump on the 360 bandwagon. But, here in VR, what didn’t work on a screen is a great storytelling experience.

In the next scene, Elliot and Shayla are strolling along the boardwalk. Important to note here is that the camera is following them again, moving the user. Esmail didn’t put any important dialog in this scene, only using the tone and mood to convey a story point (that Elliot and Shayla have made up after some cold-shouldering and are having a happy memorable time). I find this interesting to contrast with the slow pacing and slow conversations that are placed in static scenes. To get in Esmail’s head a bit, I might be inclined he believes that the camera shouldn’t be moving at all when you need the viewer to focus on a important bit of story. This scene itself transitions to a interesting colorful montage.

Happy montage
Happy montage

For sure, Esmail did lots of interesting things here. I’m sure I can rewatch again and again and find more. I do want to move on to other videos as interesting as Mr. Robot is. That said, I DO want to end Mr. Robot with one scene that really stood out for me, and that’s when Shayla and Elliot are relaxing in bed. I am interested in 360 Video to offer perspectives that aren’t seen in real life, and this bed scene touches upon that desire but doesn’t go overboard. Check out the following shot with the camera positioned above the bed making them look upright. This is often used in TV/film and looks completely normal. In 360, however, it takes some getting used to. There’s some dissonance while you figure out your orientation – but once you do, there’s a bit of an aha moment that really adds to the experience. Other than the camera orientation, this scene is more of the slow, enclosed, conversational scenes that make the rest of the piece work well.

inbed
Shayla and Elliot in bed

Saturday Night Live: Jerry Seinfeld hosts a Q&A with Surprise Guests – Directed by Chris Milk

Runtime: 8:00

Filesize: 787MB

To be clear, I really did like this video, which is surprising because I’m one of those people that think SNL has dropped way off in quality since <insert when viewer first watched it>. For me those were the days of Phil Hartman, Chris Farley, Mike Meyers, etc. Whether I’m right or wrong, doesn’t matter. My point is that I thought this video was well done, but after the endless commentary on Mr. Robot, I don’t have too much to say about this one. That’s only because there’s not much going on from a storytelling standpoint here.

Since there’s not much to say, its the perfect opportunity to note opening credits for most 360 Video I’ve seen. It’s really no more than noteworthy, but opening credits and closing credits all seem to have standardized around displaying what they need to display in a narrow field of view, much like what you’d see on screen/TV and then replicating it 3 or 4 times over around your head so you can see the text and intro graphics regardless of where you look.

SNLtitle
Opening credits and corner of repeated title screen

With that out of the way, we can talk the actual video. In this experience, the 360 camera looks like it’s positioned just above the traditional TV camera. You are facing the host, Jerry Seinfeld, and can turn to look at the crew and the audience…the entire room.

Seinfeld Hosting
Seinfeld Hosting

If you’ve never been an audience member for SNL before (I haven’t), it’s interesting to experience what it’s like behind the camera. You can see camera operators, a photographer, the boom mic crane, what the set looks like, etc. Its fairly interesting.

Unfortunately, the action takes place fairly quickly right away and you have to start paying attention. Contrast this with other VR stories and 360 video. Typically you might want to give the user some time to acclimate to the scene before starting the action. Here, being in the SNL audience is interesting, but Jerry Seinfeld is on stage in just 15 seconds and starts delivering his monologue so I was a bit torn what I want to pay attention to.

If it was JUST watching a monologue intended for TV, this would be a disappointing experience. However, it turns into a sketch. Jerry launches into a Q&A with the audience who just happen to all be celebrities and/or past hosts.

Yes, its funny. And YES it makes use of the 360 experience. The viewer is immersed into the audience here, because you watch Seinfeld call on somebody in the audience, and you turn to see if you can find them. In this way, the video sort of acknowledges that you’re there by making you do what you would have to do in real life as an audience member.

Here’s where things break down, though, and its purely technical. Check out this shot of when James Franco asks his question:

James Franco asks a question
James Franco asks a question

Can you even tell that’s James Franco? More importantly, do you think it would be easy to identify his facial expressions and body language? Recognition is so important to this bit involving celebrity. And facial expressions and body language are key to comedy and acting. You might think this is an anomaly because he’s a bit far away. After all, John Goodman is another feature and he’s fairly recognizable but also fairly close (hint: he’s just past the lady at the bottom center). Its a fair point if you’re just looking at this image, but in the experience Franco looks fairly close…just blurry and not crisp enough in the encoding. As a viewer you feel like you SHOULD be able to see him better and it’s imperative to the experience, but the detail of the capture and/or the encoding prevents this.

Oddly enough, Mr. Robot didn’t suffer from this despite being longer and having less overall file size. This point is exactly why I’m prefacing each writeup with the duration and file size. This SNL video is closer to what you might expect from live 360 shooting without the benefit of planning and script to overcome these type of issues.

Last disappointing bit is that while it’s interesting to see camera operators, the boom mic, etc, you can ALSO see the folks that hold the cue cards. It really detracts from the spontaneous and hilarious premise of this off the cuff Q&A to have cue cards right in front of you.

cuecards
Center: Assistant holding a cue card for Larry David’s question

All in all, this is a fairly successful 360 video. I quite enjoyed it, but where it falls down, it suffers because 360 isn’t really the medium they intended to target with this bit.

 

Vice News VR: Millions March in NYC – Directed by Chris Milk & Spike Jonze

Runtime: 8:08

Filesize: 1.07GB

Vice News, from my understanding, is a bit more gritty and risky/risque than traditional news media. When I come across Vice stories and videos I seem to recall reporters inserting themselves into dangerous scenes and doing raw reporting. Even I’ve put Vice in the wrong nutshell, that seems exactly what they are doing in this 360 video (though to be fair, a protest esp in NYC is probably not going to be dangerous at all). This protest in particular is to highlight the unfair treatment of black men and women at the hands of the US justice system and police.

One interesting thing done right off the bat is in the opening title. I criticized SNL’s 360 video for not giving enough time for the viewer to get acclimated to the scene. Here it’s just right, and they’ve incorporated the title of the piece in an interesting way. It’s wrapped around your head with most of the title outside your field of view at all times. So, to read it, you must look all the way to the left and pan your head from left to right. Meanwhile, a crowd (protesters) appear below you.

Vice News Title Screen
Vice News Title Screen

Initially I panned this as a bad choice. But, after thinking about it, having to work across 270 degrees to read the text doubles as a mechanism to also take in the scene around you. Given that an Oculus user already had to click on the video thumbnail and download, having the title be legible again in the video is not as necessary as one might think. So, even if the user fails to read and struggles to take all the text in in one go, its still OK.

After the opening scene, we cut to being eye level right next to a demonstrator. This group of men chants “I can’t breathe with y’all on my neck”, of course a reference to the killing of Eric Garner.

Demonstrators chanting "Can't breathe"
Demonstrators chanting “Can’t breathe”

What was interesting for me is my reaction to being right up next to the demonstrator. In real life, I tend to stay far away from demonstrations like this, whether it be Black Lives Matter, a subway musician, or someone in a city park on a microphone calling for sinners to repent. Reflecting, I think it comes down to one thing for me: how to react and what kind of body language am I using in response. For example, someone standing on a soap box saying the world will end tomorrow is super interesting. I don’t take them seriously, of course, but I would love to hear their whole speech for the entertainment value. On the opposite end of the spectrum – a demonstrator like this, representing a real cause, I might like to hear what they are saying, but especially with myself being white and someone who historically COULD look down on them, I may be a bit self conscious of what type of message I’m sending to them with my reactions (or lack thereof) as I stand there and observe them.

I talked before about how 360 videos do well to acknowledge the viewer as part of the scene. Mr. Robot does this exceptionally well, and SNL with Seinfeld did this to a small extent. In a scene like this, NOT acknowledging the viewer seems to work exceptionally well. I can observe and actually listen to the message without worrying about my reactions or being self conscious.

In addition to watching demonstrators, I’ve never been part of a march. So it was interesting to be swept down the street passing chanting, banners, stalled traffic etc. As this is a camera, regardless of whether it’s 360, if it’s being carried somewhere, there needs to be an operator. While the cameraman usually stays perfectly behind the viewer through most of the video, he’s bit close for comfort in this scene:

cameraman
Cameraman, a bit too close for comfort

 

Like I said, its a bit disconcerting, but its 360 footage being capture in an uncontrolled environment. He can hardly be blamed!

In the next scene, we follow a reporter down the street to a “Die In”. A few dozen people are lying on the ground to demonstrate dying in the streets. Unfortunately, the technology and more specifically the capture resolution/encoding failed here much like it did in the Saturday Night Live video. For starters it was night time, so the visibility wasn’t great, and well…can you tell what is actually happening in this scene?

diein
A “Die In” demonstration

This image, as in VR is extremely hard to comprehend. Its actually worse in VR because you feel like you’re there and because of that you think you SHOULD be able to pick out the shapes on the ground as people and bodies. I was actually a little surprised when they started getting up because some of those people were white. I convinced myself that part of the reason that I couldn’t make heads or tails of the scene was that a black man or woman’s face at night lying on the ground would be surely be hard to see. But no in fact, there were many white men and women as well.

I’ll end this video’s analyzation with an interesting reaction I had to one of the protestors. In the same “Die In”, as people were starting to get up, one lady got up half way, raising her hands and ended up on her knees. The reporter we were following crouched down next to her to interview and get her thoughts.

crouched
Reporter crouching to interview a demonstrator

What was interesting for me was my posture as this happened. Previously, I was sitting upright in my office chair as I watched this video. However, when the reporter crouched down and the camera followed, my entire posture in my chair went lower into a seated squat. I took note that it was an interesting thing that with enough direction from on screen cues, my body would follow!

 

Catatonic – Directed by Guy Shelmerdine

Runtime: 8:09

Filesize: 481MB

Catatonic was a fun little horror experience akin to what you’d get when you and your friends drive for a couple hours on Halloween to a small rural town where someone has redone their barn or farm to be a spooky experience complete with actors jumping out at you to scare you.

This 360 video takes place in a run down insane asylum. Despite thinking it worked pretty well, it did what contemporary VR creators dictate you should not do: put the camera on wheels and roll around. I eluded to this before when some of the videos above had this to a lesser effect, and it harkens back to early VR tests when lots of people experimented with putting you on some kind of track like a rollercoaster. The movement depicted in your vision contrasted with the lack of movement felt in your body was named as the prime reason for feeling motion sick. So, of course, content creators name this as a prime thing not to do.

All that said, I personally felt fine despite being on a moving track the entire time. In the story, you are a patient being wheeled through the asylum in a wheel chair. In addition to being slow, you can also look down and see your body and chair. Perhaps this addition of a “reference object”, or something that persists in the same place in your field of view, cancels out or minimizes the motion sickness.

In a wheelchair (reference object to reduce motion sickness?)
In a wheelchair (reference object to reduce motion sickness?)

Remember I talked about those Spooky Barns? Well, some people get scared by the things jumping out at you. Not me (or my wife for that matter), we see things coming maybe get a little surprised, but really just end up giggling at the attempt. Same here. The first thing you encounter is this zombie looking girl that kinda snarls and lunges at you as you roll past. I had the same reaction. Ironically, I was much more concerned as I was wheeled into the room that my knees would smash into the doorway (no seriously, it made me a bit uncomfortable).

Scary girl
Scary girl

All in all, it was more of the same. Interesting and fun no doubt, but not TOO much more noteworthy to speak of. I was wheeled past disturbing patients. Tricks of light and time dilation made things creepier as well.

One thing that really made me take notice of after I experienced it was the use of time and making me impatient to look around. There is a quite normal looking man that wheels you around for the first half of the experience. He even tries to soothe you in the beginning. But, he’s behind you. It’s an effort to look backward and up to take note that there’s someone there. I think I only did it once out of curiosity.

However, an interesting thing happened. After a particularly fast paced period of time when lighting suddenly changed and time sped up for a second and things got creepy, there was a few seconds of complete inaction. I was left sitting in the chair standing still and nothing was happening. The scene had the effect on me to make me impatient and look behind me to figure out why I wasn’t moving. It turned out the nice man was gone, and a scary hooded figure lurched out a door and took over. If I wasn’t given time to get impatient (possibly after such an actionable few seconds), I would not have looked backwards (again it’s awkward to do so) to see what I was supposed to see.

From there, the cheesy horror and effects picked up!

In real life I’ve been getting bloodwork lately, and I JUST CAN’T look at the needle going into my arm. It’s too much…freaks me out. However, when presented with the following:

needles

…I can’t look away! I know it’s not real, so I guess I feel empowered to watch it and laugh off my complete irrational freak out on needles.

And from then on its more good, cheesy horror with some personal bubble invasion thrown in to try to make you uncomfortable.

Invading personal space
Invading personal space

So that’s Catatonic! I figure if those cheap horror movies and Halloween barns scare you, this might as well. For me, it was good B-Movie fun.

 

New Wave – Directed by Samir Mallal & Aron Hjartarson

Runtime: 2:17

Filesize: 159MB

This is a quick one! It has a bit of a novel approach, though. I’m not sure how well it works to be honest with you. The video opens on a beach. In front of you is a washed up boat. Really just a nice relaxing scene, and it holds there for around 40 seconds for the viewer to get acclimated. Again this seems fairly standard practice for VR Storytelling.

Prior to the narrative starting, a lady walks her dog directly in front of your view. My first couple of times through, the dog walking seemed a bit meaningless and odd. I ignored it waiting for something ELSE to start up. It turns out though, on my 3rd viewing, I noticed its a guiding action. It was a bit of action meant to make you follow your head behind you where the actual narrative started with the two main characters.

Walking the dog. A guiding action to the main story

So obviously this bit of direction missed the mark for me. Luckily, I hear a voice over narrative, and I know to look around for what was going on.

The interesting bit about this experience is the spatial audio. The setup is that this couple is fighting and go off to different areas of the beach. You can see each by turning your head, but also when you turn your head you can hear each of their thoughts…a narrative of their anger towards the other from their perspective.

Split View
Split View

Unfortunately, I didn’t think this worked so well, because it took a long time in the short span of this video to figure out that there was different audio depending on where I looked. When I figured it out, I got a bit frustrated because I couldn’t listen to both dialog at once and felt like I was missing things.

All that said, it was an interesting device to tell the story!

 

LoVR – Created by Aaron Bradbury

Runtime: 5:20

Filesize: 273MB

LoVR is an interesting concept. Its all computer generated data visualization that you fly through, and it’s about love. Aaron’s description verbatim is this:

A story of love, told through neural activity. Chemicals are released, neurons are activated and a form of poetry is revealed within the data.

You know, to be perfectly honest, I’m not quite sure this needs to be in VR. I dig the concept of measuring the brain’s neural activity and pinpointing the moment that falling in love happens. At that moment the music picks up and the dataviz starts getting extreme.

I want to guess that this experience was done with VR in mind, but the creator wanted to expand reach to flat screens as well so made an experience that could encompass both. Flying through the visuals is a nifty experience, but at the same time, not much of your periphery or even behind you matters.

All that said, its a nifty concept and video!

Baseline reading, not in love yet
Baseline reading, not in love yet
Flying through, noticing beauty and starting to sweat a bit - music is picking up
Flying through, noticing beauty and starting to sweat a bit – music is picking up
Looking back on the chaos at the moment of love
Looking back on the chaos at the moment of love

 

Lowes Home Improvement

Runtime:4:56

Filesize: ???

I’ll end this post with a weird one. Despite having some negative comments on various aspects on all the 360 videos I talked about, the criticism is just to point out interesting decisions and aspects. Overall, the videos I watched were pretty damn great. 2016 is just the tip of the iceberg as well. 360 video will continue to evolve as an art form and I think we’re mostly in the experimental stage of things right now. All of the above videos were from Within, and its certainly no mistake that a company founded on great VR Storytelling would produce and highlight great 360 video.

What I’m about to mention next isn’t anything like that, but it has a unique take on instructional videos!

I’ve been to Lowes Home Improvement stores before for various projects, and they really do try to help guide you through your projects. Their website is no different. Having owned a home, I’ve gone through some of their instructional videos or tutorials a few times to make or fix something. It does help, for sure.

However, when your hands are full and you’re trying to fix something while at the same time trying to scrub a video back or paginate the instructions back because you missed something…well it’s a pain!

This video attempts to address that problem. I question the effectiveness as I wonder how unweildly wearing a headset (even if wireless like the GearVR) would be while trying to do home repair. All the same, its a bright idea!

This instructional video is to make a quick DIY cement table with wooden legs. Instead of introducing the steps over time, the steps are done in space. So step #1 is right in front of you. As you turn your head in 360 degrees you can see the rest of the steps. This makes it easy to go back and forth between steps you might not understand…just look the other way! Each step video is on a continuous loop so the action is incredibly easy to revisit.

Making a table
Making a table
Finished!
Finished!

 

And that’s just a few….

This was a bigger blog post than usual, but it forced me to think through some very interesting pieces and evaluate what’s good, what’s bad, and just where we are technically and experience wise in 2016 for 360 video. I picked the few that I though were most interesting – so everything here I enjoyed and send my kudos to the creators. There are, of course, even more that are as interesting – but lots that fall flat as well. The most important thing to note is the fact that everyone is experimenting with what works and we are at the beginning of a new way of thinking about video. It will be quite interesting to see the caliber of stuff that 2017 brings!