360 Video: An Afternoon Within

With the Oculus store continuing to get some interesting titles while I wait with baited breath over getting my very own Oculus Touch Controllers when they get released, its easy to forget about 360 Video. Some say 360 Video is not VR at all like Vive engineer Alan Yates:


The problem Mr. Yates seems to have is that 360 Video chops off some key VR immersion factors like being able to move your body (not just your head) as well as being able to interact with the scene and have the scene interact with you.

In addition to lacking some key things that make VR the immersive thing it is when you think about content like games, it can also box in the content creators to doing things that are less than ideal in VR. A big one here is knowing that when in VR, an experience can induce a little motion sickness by moving the user against their will. Another is dropping a user into a space leaving them disoriented and confused while they figure out where they are. 360 Video continues to suffer from this as traditional video creators make the choice to pan their cameras, or don’t pay enough attention to their viewer when cutting between scenes.

All that said, whether it is or isn’t VR, it’s certainly an emerging art form. It’s enough like traditional video to be approachable for longtime video creators, but also deceptive in that these same creators need to question fundamental techniques or risk creating bad or even vomit-inducing experiences.

It had been a while since I last spent a couple hours enjoying 360 Video, so yesterday afternoon I decided to go back in. Things have improved, and there are some fascinating shorts being produced as well as not so fascinating shorts with gimmicks that may or may not work. Let’s talk about them.

Mr. Robot – Written and Directed by Sam Esmail

Runtime: 13:03

Filesize: 586MB

Kicking things off with Mr. Robot is a bit unfair. Not only do I love the TV show, but of the shorts I’ve seen so far I like this one the best. It uses a number of gimmicks that don’t feel gimmicky, and breaks some other rules in a way that feels OK.

Also interesting, is that many of the best videos that I’ve seen and I want to talk about are directed or co-directed by Chris Milk who runs a VR Storytelling company called Within (and then posted on the Within Oculus channel). Despite Mr. Milk making some compelling shorts, Mr. Robot shines greater than any of them for me AND it is directed by original Mr. Robot creator Sam Esmail.

BTW…only read the rest of this one if you can handle Season 1 spoilers.

Elliot, the main character, turns out to have some major split personality disorder. Through season 1, Elliot’s routinely expresses his thoughts to you, the viewer. It would be a bit gimmicky breaking the fourth wall like this all the time, except when you consider the possibility that the viewer is another personality or just a remnant of Elliot’s troubled mind.

Elliot acknowledging the user

The 360 short uses this setup to it’s advantage, and as you might expect it just so happens to be perfectly suited for VR. You enter the experience next to Elliot on his couch while listening to his internal thoughts (expressed as a voice over) lamenting about some past memories. In true Mr. Robot fashion, Elliot looks at you to acknowledge your presence occasionally. Turns out this works great for VR, too. The user having a presence (even if just an inactionable one) does great for immersion.

An interesting thing that happens early on is that the director wants to focus on the laptop in the kitchen. It’s a bit weird, in that it feels like a throwaway story point that never really matters. That said, with traditional video, a director might edit the shot and cut over to the laptop. However, with 360 video we can’t have hard edits like this that disorient the viewer, so instead the lights flicker in the kitchen around the laptop and the user’s attention is drawn there.

Elliot also happens to be smoking a joint which presents some interesting opportunity for gimmick. Elliot takes a big puff and exhales at the camera which offers an opportunity to transition to a past memory. While this isn’t necessarily a 360 video gimmick, what follows is him sitting in the exact same spot in his past memory. In fact the scene looks no different, which is an important point not to disorient the user’s experience. The use of whiting out the scene with smoke serves to transition the story but not necessarily the set.

The marijuana use also provides a convenient way for the camera/viewer to get “high”. As the marijuana starts taking effect, the camera starts floating to the ceiling offering a wider view of the shot and allowing Elliot’s past love interest to enter. He even calls out “Oh wow, I got really high…look at us up here”.  Very important to reiterate here that Esmail coupled the camera transition with talking to the user about it in while SIMULTANEOUSLY pretending its part of the story.

Camera floats up as Elliot gets high

To further layer on this storytelling feat, the camera view slightly shifts the user to look at the door in preparation for his former love interest Shayla to walk in.

As Shayla knocks on the door, something a bit awkward happens which is that with each knock, the camera cuts slightly in every direction a few times. It feels like a mistake, but perhaps it was an opportunity to cover up a production mistake where the shots weren’t aligned.

Shayla enters and wanders a bit around the room talking to Elliot. As she enters the bedroom, a light turns on and illuminates the room for a moment. To me it was a bit awkward and I couldn’t find any purpose to it, but its over quickly.

As she wanders around the camera pans a bit, which is breaking a bit of a VR “rule” since you have no control over it – but it’s done gently and after the initial camera floatation and other movements so doesn’t feel wrong in the least. Here the full 360 comes into effect as Elliot stays on one side of you and Shayla walks behind you. You, the viewer are in the middle of this conversation and it feels like being in the middle of a group, turning your head each way to see each person speak.

Shayla and Elliot talking


After the two leave there are some establishing shots of the beach, the shut down amusement park that would be later used as a base of operation. In these shots there is some gentle movement. Again, it seems to break a bit of the VR rule of not forcing the user to move in a way they aren’t actually moving their body – but here it feels right and I think making it a short establishing shot that’s not integral to on-screen dialog is the right way to go.

More of the story happens when the video cuts to inside a ferris wheel car. As the video goes on, Esmail seems to be limiting the storytelling to slow paced enclosed areas with the dialog being a bit slow paced as well – more like what you’d find in real life, not fast moving heated dialog with fast cuts. Again, in the Ferris Wheel scene, you must turn to each character as they talk, much like you would do as the third and silent wheel in a real conversation sitting behind two people.

Shayla and Elliot in the Ferris Wheel

What’s interesting here, is that I did previously watch this 360 video on a monitor using my mouse to pan around. I thought it was a bit boring, and I didn’t make it past the opening title, judging it as another property to jump on the 360 bandwagon. But, here in VR, what didn’t work on a screen is a great storytelling experience.

In the next scene, Elliot and Shayla are strolling along the boardwalk. Important to note here is that the camera is following them again, moving the user. Esmail didn’t put any important dialog in this scene, only using the tone and mood to convey a story point (that Elliot and Shayla have made up after some cold-shouldering and are having a happy memorable time). I find this interesting to contrast with the slow pacing and slow conversations that are placed in static scenes. To get in Esmail’s head a bit, I might be inclined he believes that the camera shouldn’t be moving at all when you need the viewer to focus on a important bit of story. This scene itself transitions to a interesting colorful montage.

Happy montage
Happy montage

For sure, Esmail did lots of interesting things here. I’m sure I can rewatch again and again and find more. I do want to move on to other videos as interesting as Mr. Robot is. That said, I DO want to end Mr. Robot with one scene that really stood out for me, and that’s when Shayla and Elliot are relaxing in bed. I am interested in 360 Video to offer perspectives that aren’t seen in real life, and this bed scene touches upon that desire but doesn’t go overboard. Check out the following shot with the camera positioned above the bed making them look upright. This is often used in TV/film and looks completely normal. In 360, however, it takes some getting used to. There’s some dissonance while you figure out your orientation – but once you do, there’s a bit of an aha moment that really adds to the experience. Other than the camera orientation, this scene is more of the slow, enclosed, conversational scenes that make the rest of the piece work well.

Shayla and Elliot in bed

Saturday Night Live: Jerry Seinfeld hosts a Q&A with Surprise Guests – Directed by Chris Milk

Runtime: 8:00

Filesize: 787MB

To be clear, I really did like this video, which is surprising because I’m one of those people that think SNL has dropped way off in quality since <insert when viewer first watched it>. For me those were the days of Phil Hartman, Chris Farley, Mike Meyers, etc. Whether I’m right or wrong, doesn’t matter. My point is that I thought this video was well done, but after the endless commentary on Mr. Robot, I don’t have too much to say about this one. That’s only because there’s not much going on from a storytelling standpoint here.

Since there’s not much to say, its the perfect opportunity to note opening credits for most 360 Video I’ve seen. It’s really no more than noteworthy, but opening credits and closing credits all seem to have standardized around displaying what they need to display in a narrow field of view, much like what you’d see on screen/TV and then replicating it 3 or 4 times over around your head so you can see the text and intro graphics regardless of where you look.

Opening credits and corner of repeated title screen

With that out of the way, we can talk the actual video. In this experience, the 360 camera looks like it’s positioned just above the traditional TV camera. You are facing the host, Jerry Seinfeld, and can turn to look at the crew and the audience…the entire room.

Seinfeld Hosting
Seinfeld Hosting

If you’ve never been an audience member for SNL before (I haven’t), it’s interesting to experience what it’s like behind the camera. You can see camera operators, a photographer, the boom mic crane, what the set looks like, etc. Its fairly interesting.

Unfortunately, the action takes place fairly quickly right away and you have to start paying attention. Contrast this with other VR stories and 360 video. Typically you might want to give the user some time to acclimate to the scene before starting the action. Here, being in the SNL audience is interesting, but Jerry Seinfeld is on stage in just 15 seconds and starts delivering his monologue so I was a bit torn what I want to pay attention to.

If it was JUST watching a monologue intended for TV, this would be a disappointing experience. However, it turns into a sketch. Jerry launches into a Q&A with the audience who just happen to all be celebrities and/or past hosts.

Yes, its funny. And YES it makes use of the 360 experience. The viewer is immersed into the audience here, because you watch Seinfeld call on somebody in the audience, and you turn to see if you can find them. In this way, the video sort of acknowledges that you’re there by making you do what you would have to do in real life as an audience member.

Here’s where things break down, though, and its purely technical. Check out this shot of when James Franco asks his question:

James Franco asks a question
James Franco asks a question

Can you even tell that’s James Franco? More importantly, do you think it would be easy to identify his facial expressions and body language? Recognition is so important to this bit involving celebrity. And facial expressions and body language are key to comedy and acting. You might think this is an anomaly because he’s a bit far away. After all, John Goodman is another feature and he’s fairly recognizable but also fairly close (hint: he’s just past the lady at the bottom center). Its a fair point if you’re just looking at this image, but in the experience Franco looks fairly close…just blurry and not crisp enough in the encoding. As a viewer you feel like you SHOULD be able to see him better and it’s imperative to the experience, but the detail of the capture and/or the encoding prevents this.

Oddly enough, Mr. Robot didn’t suffer from this despite being longer and having less overall file size. This point is exactly why I’m prefacing each writeup with the duration and file size. This SNL video is closer to what you might expect from live 360 shooting without the benefit of planning and script to overcome these type of issues.

Last disappointing bit is that while it’s interesting to see camera operators, the boom mic, etc, you can ALSO see the folks that hold the cue cards. It really detracts from the spontaneous and hilarious premise of this off the cuff Q&A to have cue cards right in front of you.

Center: Assistant holding a cue card for Larry David’s question

All in all, this is a fairly successful 360 video. I quite enjoyed it, but where it falls down, it suffers because 360 isn’t really the medium they intended to target with this bit.


Vice News VR: Millions March in NYC – Directed by Chris Milk & Spike Jonze

Runtime: 8:08

Filesize: 1.07GB

Vice News, from my understanding, is a bit more gritty and risky/risque than traditional news media. When I come across Vice stories and videos I seem to recall reporters inserting themselves into dangerous scenes and doing raw reporting. Even I’ve put Vice in the wrong nutshell, that seems exactly what they are doing in this 360 video (though to be fair, a protest esp in NYC is probably not going to be dangerous at all). This protest in particular is to highlight the unfair treatment of black men and women at the hands of the US justice system and police.

One interesting thing done right off the bat is in the opening title. I criticized SNL’s 360 video for not giving enough time for the viewer to get acclimated to the scene. Here it’s just right, and they’ve incorporated the title of the piece in an interesting way. It’s wrapped around your head with most of the title outside your field of view at all times. So, to read it, you must look all the way to the left and pan your head from left to right. Meanwhile, a crowd (protesters) appear below you.

Vice News Title Screen
Vice News Title Screen

Initially I panned this as a bad choice. But, after thinking about it, having to work across 270 degrees to read the text doubles as a mechanism to also take in the scene around you. Given that an Oculus user already had to click on the video thumbnail and download, having the title be legible again in the video is not as necessary as one might think. So, even if the user fails to read and struggles to take all the text in in one go, its still OK.

After the opening scene, we cut to being eye level right next to a demonstrator. This group of men chants “I can’t breathe with y’all on my neck”, of course a reference to the killing of Eric Garner.

Demonstrators chanting "Can't breathe"
Demonstrators chanting “Can’t breathe”

What was interesting for me is my reaction to being right up next to the demonstrator. In real life, I tend to stay far away from demonstrations like this, whether it be Black Lives Matter, a subway musician, or someone in a city park on a microphone calling for sinners to repent. Reflecting, I think it comes down to one thing for me: how to react and what kind of body language am I using in response. For example, someone standing on a soap box saying the world will end tomorrow is super interesting. I don’t take them seriously, of course, but I would love to hear their whole speech for the entertainment value. On the opposite end of the spectrum – a demonstrator like this, representing a real cause, I might like to hear what they are saying, but especially with myself being white and someone who historically COULD look down on them, I may be a bit self conscious of what type of message I’m sending to them with my reactions (or lack thereof) as I stand there and observe them.

I talked before about how 360 videos do well to acknowledge the viewer as part of the scene. Mr. Robot does this exceptionally well, and SNL with Seinfeld did this to a small extent. In a scene like this, NOT acknowledging the viewer seems to work exceptionally well. I can observe and actually listen to the message without worrying about my reactions or being self conscious.

In addition to watching demonstrators, I’ve never been part of a march. So it was interesting to be swept down the street passing chanting, banners, stalled traffic etc. As this is a camera, regardless of whether it’s 360, if it’s being carried somewhere, there needs to be an operator. While the cameraman usually stays perfectly behind the viewer through most of the video, he’s bit close for comfort in this scene:

Cameraman, a bit too close for comfort


Like I said, its a bit disconcerting, but its 360 footage being capture in an uncontrolled environment. He can hardly be blamed!

In the next scene, we follow a reporter down the street to a “Die In”. A few dozen people are lying on the ground to demonstrate dying in the streets. Unfortunately, the technology and more specifically the capture resolution/encoding failed here much like it did in the Saturday Night Live video. For starters it was night time, so the visibility wasn’t great, and well…can you tell what is actually happening in this scene?

A “Die In” demonstration

This image, as in VR is extremely hard to comprehend. Its actually worse in VR because you feel like you’re there and because of that you think you SHOULD be able to pick out the shapes on the ground as people and bodies. I was actually a little surprised when they started getting up because some of those people were white. I convinced myself that part of the reason that I couldn’t make heads or tails of the scene was that a black man or woman’s face at night lying on the ground would be surely be hard to see. But no in fact, there were many white men and women as well.

I’ll end this video’s analyzation with an interesting reaction I had to one of the protestors. In the same “Die In”, as people were starting to get up, one lady got up half way, raising her hands and ended up on her knees. The reporter we were following crouched down next to her to interview and get her thoughts.

Reporter crouching to interview a demonstrator

What was interesting for me was my posture as this happened. Previously, I was sitting upright in my office chair as I watched this video. However, when the reporter crouched down and the camera followed, my entire posture in my chair went lower into a seated squat. I took note that it was an interesting thing that with enough direction from on screen cues, my body would follow!


Catatonic – Directed by Guy Shelmerdine

Runtime: 8:09

Filesize: 481MB

Catatonic was a fun little horror experience akin to what you’d get when you and your friends drive for a couple hours on Halloween to a small rural town where someone has redone their barn or farm to be a spooky experience complete with actors jumping out at you to scare you.

This 360 video takes place in a run down insane asylum. Despite thinking it worked pretty well, it did what contemporary VR creators dictate you should not do: put the camera on wheels and roll around. I eluded to this before when some of the videos above had this to a lesser effect, and it harkens back to early VR tests when lots of people experimented with putting you on some kind of track like a rollercoaster. The movement depicted in your vision contrasted with the lack of movement felt in your body was named as the prime reason for feeling motion sick. So, of course, content creators name this as a prime thing not to do.

All that said, I personally felt fine despite being on a moving track the entire time. In the story, you are a patient being wheeled through the asylum in a wheel chair. In addition to being slow, you can also look down and see your body and chair. Perhaps this addition of a “reference object”, or something that persists in the same place in your field of view, cancels out or minimizes the motion sickness.

In a wheelchair (reference object to reduce motion sickness?)
In a wheelchair (reference object to reduce motion sickness?)

Remember I talked about those Spooky Barns? Well, some people get scared by the things jumping out at you. Not me (or my wife for that matter), we see things coming maybe get a little surprised, but really just end up giggling at the attempt. Same here. The first thing you encounter is this zombie looking girl that kinda snarls and lunges at you as you roll past. I had the same reaction. Ironically, I was much more concerned as I was wheeled into the room that my knees would smash into the doorway (no seriously, it made me a bit uncomfortable).

Scary girl
Scary girl

All in all, it was more of the same. Interesting and fun no doubt, but not TOO much more noteworthy to speak of. I was wheeled past disturbing patients. Tricks of light and time dilation made things creepier as well.

One thing that really made me take notice of after I experienced it was the use of time and making me impatient to look around. There is a quite normal looking man that wheels you around for the first half of the experience. He even tries to soothe you in the beginning. But, he’s behind you. It’s an effort to look backward and up to take note that there’s someone there. I think I only did it once out of curiosity.

However, an interesting thing happened. After a particularly fast paced period of time when lighting suddenly changed and time sped up for a second and things got creepy, there was a few seconds of complete inaction. I was left sitting in the chair standing still and nothing was happening. The scene had the effect on me to make me impatient and look behind me to figure out why I wasn’t moving. It turned out the nice man was gone, and a scary hooded figure lurched out a door and took over. If I wasn’t given time to get impatient (possibly after such an actionable few seconds), I would not have looked backwards (again it’s awkward to do so) to see what I was supposed to see.

From there, the cheesy horror and effects picked up!

In real life I’ve been getting bloodwork lately, and I JUST CAN’T look at the needle going into my arm. It’s too much…freaks me out. However, when presented with the following:


…I can’t look away! I know it’s not real, so I guess I feel empowered to watch it and laugh off my complete irrational freak out on needles.

And from then on its more good, cheesy horror with some personal bubble invasion thrown in to try to make you uncomfortable.

Invading personal space
Invading personal space

So that’s Catatonic! I figure if those cheap horror movies and Halloween barns scare you, this might as well. For me, it was good B-Movie fun.


New Wave – Directed by Samir Mallal & Aron Hjartarson

Runtime: 2:17

Filesize: 159MB

This is a quick one! It has a bit of a novel approach, though. I’m not sure how well it works to be honest with you. The video opens on a beach. In front of you is a washed up boat. Really just a nice relaxing scene, and it holds there for around 40 seconds for the viewer to get acclimated. Again this seems fairly standard practice for VR Storytelling.

Prior to the narrative starting, a lady walks her dog directly in front of your view. My first couple of times through, the dog walking seemed a bit meaningless and odd. I ignored it waiting for something ELSE to start up. It turns out though, on my 3rd viewing, I noticed its a guiding action. It was a bit of action meant to make you follow your head behind you where the actual narrative started with the two main characters.

Walking the dog. A guiding action to the main story

So obviously this bit of direction missed the mark for me. Luckily, I hear a voice over narrative, and I know to look around for what was going on.

The interesting bit about this experience is the spatial audio. The setup is that this couple is fighting and go off to different areas of the beach. You can see each by turning your head, but also when you turn your head you can hear each of their thoughts…a narrative of their anger towards the other from their perspective.

Split View
Split View

Unfortunately, I didn’t think this worked so well, because it took a long time in the short span of this video to figure out that there was different audio depending on where I looked. When I figured it out, I got a bit frustrated because I couldn’t listen to both dialog at once and felt like I was missing things.

All that said, it was an interesting device to tell the story!


LoVR – Created by Aaron Bradbury

Runtime: 5:20

Filesize: 273MB

LoVR is an interesting concept. Its all computer generated data visualization that you fly through, and it’s about love. Aaron’s description verbatim is this:

A story of love, told through neural activity. Chemicals are released, neurons are activated and a form of poetry is revealed within the data.

You know, to be perfectly honest, I’m not quite sure this needs to be in VR. I dig the concept of measuring the brain’s neural activity and pinpointing the moment that falling in love happens. At that moment the music picks up and the dataviz starts getting extreme.

I want to guess that this experience was done with VR in mind, but the creator wanted to expand reach to flat screens as well so made an experience that could encompass both. Flying through the visuals is a nifty experience, but at the same time, not much of your periphery or even behind you matters.

All that said, its a nifty concept and video!

Baseline reading, not in love yet
Baseline reading, not in love yet
Flying through, noticing beauty and starting to sweat a bit - music is picking up
Flying through, noticing beauty and starting to sweat a bit – music is picking up
Looking back on the chaos at the moment of love
Looking back on the chaos at the moment of love


Lowes Home Improvement


Filesize: ???

I’ll end this post with a weird one. Despite having some negative comments on various aspects on all the 360 videos I talked about, the criticism is just to point out interesting decisions and aspects. Overall, the videos I watched were pretty damn great. 2016 is just the tip of the iceberg as well. 360 video will continue to evolve as an art form and I think we’re mostly in the experimental stage of things right now. All of the above videos were from Within, and its certainly no mistake that a company founded on great VR Storytelling would produce and highlight great 360 video.

What I’m about to mention next isn’t anything like that, but it has a unique take on instructional videos!

I’ve been to Lowes Home Improvement stores before for various projects, and they really do try to help guide you through your projects. Their website is no different. Having owned a home, I’ve gone through some of their instructional videos or tutorials a few times to make or fix something. It does help, for sure.

However, when your hands are full and you’re trying to fix something while at the same time trying to scrub a video back or paginate the instructions back because you missed something…well it’s a pain!

This video attempts to address that problem. I question the effectiveness as I wonder how unweildly wearing a headset (even if wireless like the GearVR) would be while trying to do home repair. All the same, its a bright idea!

This instructional video is to make a quick DIY cement table with wooden legs. Instead of introducing the steps over time, the steps are done in space. So step #1 is right in front of you. As you turn your head in 360 degrees you can see the rest of the steps. This makes it easy to go back and forth between steps you might not understand…just look the other way! Each step video is on a continuous loop so the action is incredibly easy to revisit.

Making a table
Making a table


And that’s just a few….

This was a bigger blog post than usual, but it forced me to think through some very interesting pieces and evaluate what’s good, what’s bad, and just where we are technically and experience wise in 2016 for 360 video. I picked the few that I though were most interesting – so everything here I enjoyed and send my kudos to the creators. There are, of course, even more that are as interesting – but lots that fall flat as well. The most important thing to note is the fact that everyone is experimenting with what works and we are at the beginning of a new way of thinking about video. It will be quite interesting to see the caliber of stuff that 2017 brings!










Android MediaPlayer, You are too Co-Dependent!

I actually really do enjoy Android development – it’s pretty smooth.  Got me some Intellij IDEA as my work environment and I feel like a damn Viking Captain….or an astronaut….or something else really cool.  I like Java as well.  It’s very reminiscent of my old Flash career, and the XML based layout markup is ever so much like Flex.  And HOLY HELL, the design mode of these layout editors actually work great in Intellij IDEA.  I could never say the same for Flex!

Contrast that to iOS, which is still nice, but you’re using visual editors, can’t edit the underlying stuff, and Objective C where I don’t have garbage collection….

Anyway, I digress – I came here to complain.

And what I complain about doesn’t seem to be any one platform or technology or language, it’s about how messed up video playback is in that one platform, technology, or language.  I’ve been through it with Flash (which is pretty nice actually), iOS, HTML5, and lately: Android.

What’s messed up in Android isn’t the missing documentation or needed API’s to get things working – it’s that they throw everything at you and it only works sometime on various devices!  Some would say the F-word (f-r-a-g-m-e-n-t-a-t-i-on).  I dare not speak it here because I don’t buy into the end of the world nay-sayers that say “Android is sooooooo fragmented, use a different platform”.  Unfortunately though, in regards to video, it seems to be that way.

It’s fragmented in a big way from Gingerbread to Honeycomb to Ice Cream Sandwich in regards to native streaming video via HLS.  Gingerbread doesn’t support it at all – Honeycomb, it probably will, and yes it’s there in ICS.  So Apple’s streaming video tech that it has successfully pushed across our entire industry may or may not be available on your device…natively that is.   Some smart folks though, have made C++ libraries to handle it across all versions, and I’m sure they get paid handsomely for it (by you, the developer).

That’s cool, though, maybe you don’t need to stream – maybe you just want to play a short 5 minute or so piece of progressive content.  That’s actually FINE, but be careful how you do it.



At the very base level of media playback in Android, we have the MediaPlayer.  This little guy plays video, audio, whatever.  Interesting thing, though, is that it doesn’t SHOW you the video.  To actually see what’s playing you need to call mymediaplayer.setDisplay(mysurface).  So I’ve been using a “SurfaceView”.  You can probably imagine what normal views are – they hold graphics and stuff for display.   You could use a Layout (like a FrameLayout or a LinearLayout) to bring order to your chaotic mess of graphics on your view – but that’s really what a view does, it holds graphics.

A SurfaceView is weird, though.  Sometimes you need hardware accelerated graphics power at your disposal.  So what SurfaceView does, is it punches a hole in your device right down to the GPU and uses that area to accelerate your graphics – instead of the normal layering that usually happens.  It looks like graphics that are layered with it are handled pretty nicely – I’d chalk that up to Android/Google engineering attention to detail (or something).

Anyway, that’s what the SurfaceView is – its a hole punched through your graphic layers right down to the GPU.  Its commonly used for 3D and video.  Since we want our videos running smooth as butter, we are strongarmed into using the SurfaceView.

So, there it is – you need to set your video path or URL, attach it to the SurfaceView with setDisplay, set the audio stream with mymediaplayer.setAudioStreamType(AudioManager.STREAM_MUSIC), then prepare.

Actually, don’t do prepare – do mymediaplayer.prepareASync().  Prepare locks up the graphics thread until your media starts playing.  You can then listen for the MediaPlayer onPrepare event and run mymediaplayer.start() to kick off playback.

Meanwhile, there’s a WHOLE other situation going on with being EXTREMELY CAREFUL what you do with the MediaPlayer.   So complicated, Google has apparently given us a state diagram….

It’s not that I think diagrams are too complicated – but this one is just really important.  If you call getCurrentPosition, or pause, or resume on the player and it’s in the wrong state……wellllllll your app crashes.  Even worse, the media player won’t actually tell you what state its in.  There’s no “getState” method or property.  Nope…how do you know?  Well, Google suggests you keep track of it your DAMN self with the appropriate listeners.



Think the above is complicated?  Well, good thing there’s VideoView.  This View is a MediaPlayer wrapped in a SurfaceView for the ultimate convenience.  It passes on some of the more important MediaPlayer events like onPrepared, onComplete, and more.  And then if you want more – well, grab a reference to the MediaPlayer from the onPrepared callback.  It’s the first parameter.   But again, be very careful to not call something on your player when it’s not in the correct state!

Anyway, it’s pretty simple to use – use the setVideoURI() and start() methods to get things going.  Just like any other view, it’ll size nice and dynamically along with your layout.  You can even setMediaController() on it to attach a native Android UI media controller.

So, yah – MediaPlayer is pretty co-dependent.  But that’s in a good way, right?  Why bother with MediaPlayer when we can just use VideoView?


No Tweakin the View

So, we actually had a use case in our app, where we need to hide the video momentarily before bringing it up again.  You can do some pretty sweet things with the SurfaceHolder callbacks – and listen for when the SurfaceView is created, destroyed or changed.  Unfortunately, those seem to do NOTHING in the VideoView.  Yah, you still get the events, and you can listen for whatever you want – but your video always seems to shut down when the VideoView is made to go away.

Also unfortunately, is when looking at the source code, methods are mostly private.  So there’s not even any hope of overriding something and doing your own thing.

OK – well, I need to hide and bring the video back.  VideoView can’t do that.  Well, let’s rip that co-dependent bastard called MediaPlayer right out of there and implement it ourselves.

Yah – I had problems with that too.  To play multiple videos sequentially with my custom View, I had to set the view to gone before loading the video, and then set it to visible on prepare.  Without this, I’d get some pretty crazy errors most of the time.

I’d get the same crazy errors rarely when running my video, removing its view, then reattaching again.  Most of the time it worked perfectly – I just listened for surfaceCreated, then attach the running media player again with setDisplay.  Other times, it would error out when I did that and I’d get that crazy error.

What’s the crazy error?  The error is 1, which means it’s unknown, and the extra is – 2147483648.  I can’t find any information whatsoever on that error online that makes sense.  Some folks on stack overflow say the file has a corrupted header size on the server, others claim its an invalid format, and more.  None of them complain about this happening in the freaking middle of playback.

Ok, well, whatever, it happens rarely, and we can catch it and move on.  Fun fact: if you catch the error, you can return false from your custom error handler to have the MediaPlayer mark itself complete and have the onComplete handler called.

So I have a custom media player that seems to work well MOST of the time on my two Honeycomb tablets.  When it fails, it’s rare, and we can move on….PROBLEM SOLVED!

….that is until I try it on my Kindle Fire.  I can hear the audio but not see it, while on the other devices it works perfectly.  Seriously, what the hell?  I spend mucho amounts of time comparing my code to the Android VideoView source and don’t see anything different – yet still, here we are.   And now, I’m willing to bet why VideoView doesn’t give you much option to keep this alive in the background – because it’s just a quirky mess on some devices.


Just Use the VideoView, it’s Safer

I never did figure out my problem.  In the end, I figured we’d better just use the VideoView and not have it removed from the screen in mid-playback.  Don’t do anything fancy with your surfaces because it’ll probably fail on some device somewhere.  Perhaps someone who is smarter than I knows where I fell short.  Or perhaps, this is an insurmountable problem.  It sure seems quirky as hell to me, though!

Other issues included playback of some mp4’s at all on some devices.   Read up on the supported media formats.  Most devices seem to be very forgiving allowing you to encode a bit differently as long as it’s in wide practice, but some older Android models don’t like some of these encodings – so it’s safer to stick to the speck.

And so, that is why the MediaPlayer is way too co-dependent on the VideoView.  Cause you think MediaPlayer can operate without its buddy like that.  It tricks you….lulls you into a false security, but always bites you in the end!

WTF HTML5 Video?

Just thought I’d put up a quick little post this morning as we talked a little bit about one of our HTML5 video projects in our morning meeting.

HTML5 video is pretty rad, it’s not all that though. Microsoft’s Smooth Streaming + Silverlight and Adobe Flash’s Media Server + AS3 give you the ultimate control (despite maybe being a little too complex for beginners). The AVPlayer framework in iOS is pretty nice too, but out of the box it doesn’t give you nearly the same control as MS or Adobe.

Android is pretty weird.  Its similar to iOS, but the HUGE catch is that you can only play one video at a time with it’s native media player.  Lets say you start up a stream, then pause it.  You then try to start up another stream while that first one is playing.  All is well.  Go back to the first and unpause…..BOMB.  It’s broken and won’t really explain why.  Boo.  It’s because you can’t have two media players running on Android at the same time.  Something that any other platform lets you do.

HTML5 video looks fairly straightforward as you first get into it.  On mobile it gets pretty funky – you’d pretty much assume it gets WTF.  One such example?


It will steal your taps on iOS!

So you’re humming along doing your awesome HTML5 video experience.  Out of laziness or whatever, you don’t CSSify your own player controls.  You justify saying “I want that native experience on any platform – if it’s on Android it needs to look like Android dammit, and god forbid I tweak Steve Job’s master vision and override the Quicktime look and feel”.  So you’re using the native player controls – quite easy to do by adding the controls=”controls” attribute to your video tag.

The day your project is due, some idiot marketer comes along and says “Hey, lets launch a little survey after they complete the video!  We’ll put a popup box over the video, and they have to click something to continue to the next”.  Your response, “Yah, ummm, whatever…sounds craptastic, but I’m not paying the bills”.  So you make a nice little CSS popup box, give it some absolute positioning  and pop it right over the video.

Your marketing friend is all happy until they get out the corporate iPad.  They finish the video, watch the popup, try to click a button…..and nothing.  Your buttons don’t work.  “You must be mistaken you say…for this works in Chrome”.

They aren’t mistaken – HTML5 video on iOS WILL STEAL YOUR DAMN TOUCH EVENTS.   Anything intersecting with the video, whether it’s under or over will not be tappable.

Let’s rewind back to the beginning of our story – if you hadn’t used the native controls and made your own CSS controls in the first place, this problem wouldn’t exist.  WTF HTML5 video?


There is of course some more wonkiness on mobile, but surely not on our last bastion of browser freedom Chrome?  Yes even Chrome.


Yes Even Chrome

Its a little unfair to blame Chrome yet, since it’s like the only freaking browser capable of playing video.  “But but but Firefox and OGG” you say.   To that I say, find me an OGG file and I’ll try it out, until then we’re talking about Chrome.  And oh yeah, IE9 and Safari I guess…..whatevs.

Unlike it’s Android bretheren Chrome on the desktop can play 2 videos at the same time. Rock!  However this just leads to extreme confusion when it can’t.  And your confusion is extremely likely when you’re a developer.  See Chrome CAN play 2 videos at the same time, just not 2 of the SAME videos at the same time.

You can imagine this can lead to extremely confusing times as a developer, when you leave one tab open with your video loaded, and then try to launch another tab thinking you closed the first.  You get nothing….a black screen.  After pulling your hair out for 10 minutes trying to wrestle file paths in your code, you find the open tab and close it.  Wheeeeeeeeeeeeeee it works again.  WTF HTML5 Video?


No Sound No Worky

Lots of us work in an office, usually this means being kind to your cubical neighbors.  So, either you wear headphones or you just develop without sound until you actually need to test it.  I do this, and occasionally this means long stretches of time where I don’t have headphones plugged into my desktop.  I’m occasionally burned by this.

Guess what, if your machine doesn’t have audio output setup in some fashion – the damn video won’t even play.  Seriously.  I just tried this again in Chrome 21.  If you want to play video, at least leave your headphones plugged in with the volume down.  I haven’t had this problem on my laptop since the speakers are built in and always there.  But on a desktop, it can be pretty likely that you won’t have any hardware audio device at the time.

WTF HTML5 Video?


There are lots more WTFs, but I will leave you to wrap your mind around these.




Injecting Metadata with FFMpeg and Node.js

In a previous post, I told you guys how awesome FFMpeg was.  I shared how to use it to convert video files using the Node.js FFMpeg wrapper.

Given all of the awesome things that FFMpeg does, injecting metadata into a file doesn’t sound that impressive – but it’s an important task…and one that I struggled with a bit!  Why the struggle?  Well, its the syntax!

Using FFMpeg on the command line for this actually is a piece of cake:

ffmpeg -i in.mp3 -metadata title="my title" out.mp3

What’s happening here isn’t hard to see – we’re taking an mp3 called “in.mp3″, setting the title field in our metadata, and writing our “out.mp3″ file.  Really easy, right?

Well, let’s put this in Node.js terms.  Previously, to transcode/convert my mp4 video to mp3 audio, I simply executed the following in Node.js:

ffmpeg.exec(["-i", infile, outfile], callback);

Simple right? Well it’s pretty easy to add a flag to make it do something extra:

ffmpeg.exec(["-i", infile, "-myflagtodosomething", outfile], callback);

Indeed it is easy – flags go between the infile and outfile array elements. You can have as many flags as you want. So lets revisit our metadata flag. We see that the flag is called “-metadata”. But there’s more to this flag of course – there’s a secondary option called “title” AND I have to use this as a key, and set the value of this key to be the actual title I want.

How to do this? Well documentation calls everything you put in that array a “flag”. So I’m thinking, either non flags belong outside, or everything goes in one flag as a big string or something….like “-metadata title=”my title”. Turns out no. The way to do it is to treat the options like a separate flag, like so:

ffmpeg.exec(["-i", infile, "-metadata", "title=mytitle", outfile], callback);

So this works great! Though, I have another problem – it seems like it’s processing a fair bit and running the whole file. My formerly 128kbps MP3 file has been reduced to 64kbps in my output! Lots of lost quality here unfortunately. However, we can tell it to not process the audio and leave as is – just simply copy the audio data to the new file:

ffmpeg.exec(["-i", infile, "-acodec", "copy", "-metadata", "title=mytitle", outfile], callback);

Perfect! Audio is intact. Our “acodec” flag specifically targets our audio track. I’m not worried about video here since it’s an MP3 audio file.

Last thing…since I’m simply adding metadata to an existing file, can’t we just use the same file? Can’t our in.mp3 be the same as our out.mp3? Yes, but it’s tricky. If you simply execute the command line FFMpeg option, you can use the same file for in and out. However, partway through this process, FFMpeg will ask if you’d like to overwrite the file. You hit “Y”, and it continues. Try to do the same thing in Node.js – well it hangs. The process locks up waiting for you to hit “Y”, with no way to continue. Bad news….

Good news is that you can use a “-y” flag!

ffmpeg.exec(["-i", "myfile.mp3", "-y", "-acodec", "copy", "-metadata", "title=mytitle", "myfile.mp3"], callback);

The “-y” flag forces the process into choosing yes to overwrite the file.

So there it is – adding metadata with Node.js/FFMpeg. And since it’s simply a process that’s spawned, I imagine this applies equally well to any other language – be it Python, PHP, Ruby, whatever.

Good luck with your metadata!

Showing an Image for your Audio Only HLS Stream on iOS

Now that I have a handle on this, it seemed like a sick joke how many false paths I was led down with this task.

So what am I trying to do here, what’s the use case?

Well – one of the unique things about the Apple marketplace is their restrictions on streaming network video. If I have an Apple HTTP Live Stream that I host on my site and I’d like my iOS device to stream/play it Apple has some restrictions. If I don’t follow these restrictions, they will reject my app!

One of these restrictions is to have an audio only component to my stream that is 64kbps or less. In the world of streaming, you’d typically switch back and forth between different quality streams depending on your bandwidth. And in this case, Apple forces you to have a low quality, no video stream. The reasoning is that if the user has a bad connection, they’ll sit through this instead of a video that keeps pausing to buffer.

This presents an interesting user experience problem. Does my app feel broken if my stream keeps stuttering and buffering, or does my app feel broken if I see no video, but a black screen and just hear audio. Personally, as someone who is tech savvy I know a bad connection when I see it, so I’d rather have it pause to buffer, but ANYWAY….

One of the ways to enhance this unfortunate user experience is to show some sort of text to indicate to the user that “No this app isn’t broken, we’re just not showing you video because your connection sucks”. How to do that though? Well it was suggested to me on the Apple forums to embed an image into the audio only stream, so that when it switches over the user is presented with the messaging.

How to embed it in the stream isn’t so hard and outlined here.  Basically you set a couple flags to indicate that your metadata is a “picture” and where the file is that you’d like to embed.  Just be careful though, because the image is embedded into every segment and contributes to your 64kbps allotment.

Showing the image is the thing that I had a hard time with.  The first thing that led me astray is this line from the HTTP Live Streaming Overview:

If an audio-only stream includes an image as metadata, the Apple client software automatically displays it. Currently, the only metadata that is automatically displayed by the Apple-supplied client software is a still image accompanying an audio-only stream.

Well FANTASTIC right?  So if I included an image in my stream as metadata, it will display automatically!  Um no.  And let me tell you, it was pure hell trying to find this out.  I was trying to test my newly minted stream outside of my app.  I used Safari, Quicktime, and VLC to try to see my awesome image – no luck.  I even opened the AAC file in Adobe Audition – it wasn’t there either in the metadata section.

Then I started my online hunt for example streams online with this image.  No luck – either my Google-Fu isn’t strong enough, or nobody does this (or advertises doing it).  A co-worker pointed out that he DID see the image data in the stream, and yes, my AAC  audio segments were the right size to have the image in them, so what gives?

Turns out that this magical image only works using the video tag in HTML on iOS Safari and in your own iOS app.  So there’s no hope in verifying it on the desktop.  But wait, it gets worse!

In your own iOS app, it will only display automatically if you are using the older MPMoviePlayer framework.  I checked it out, and yup it worked (finally)!  The problem is that I was using the newer AVPlayer framework.  And from here on in, it’s COMPLETELY manual.  What’s manual mean?  Well, you’re in charge of all your metadata.  This means that if you’d like to show that image you embedded, you need to grab the raw binary data, convert it to an image, and display it your DAMN self.

Fine, then – we have a “timedMetadata” property, let’s use it:

for (AVMetadataItem* metadata in [self._avplayer currentItem].timedMetadata) {
    if ([[metadata commonKey] isEqualToString:@"artwork"]) {
       self._overlayImage = [UIImage imageWithData:metadata.dataValue];
       self._overlayImageView = [[UIImageView alloc] initWithFrame:CGRectMake(self.frame.origin.x, self.frame.origin.y,     self._overlayImage.size.width, self._overlayImage.size.height)];
       [self._overlayImageView setImage:self._overlayImage];
       [self addSubview:self._overlayImageView];

That’s actually not too bad right? That timedMetadata property is pretty handy. There’s one mind-boggling catch though. You must add a AVPlayerItem observer for timedMetadata, like so: (where item is a AVPlayerItem)

[item addObserver:self forKeyPath:@"timedMetadata" options:0 context:nil];

If you don’t do this, your timedMetadata will be null. So it’s like that old riddle – if a tree falls in the woods and nobody is around to observe it, did it really fall? Apple says no. Actually they didn’t say no – they just assume that you’ll arrive to that conclusion.

When you do add that observer, you’d think that you would have an event to trigger showing this image. That would be true…..if you don’t care that it won’t go away when your stream switches back to video+audio. It’s kind of maddening.

So, when you get the timedMetadata event, all seems well. You have the image data available to show the image, and you can go ahead and do it. After around 10 seconds pass and you get to your next segment you’ll get another timedMetadata event. If the stream switched to video+audio, this will be the last one you get. It’s kind of late to let us know that “for this past segment we should have not been showing the poster image”.

“But don’t worry”, you might say – we’ll just check the timedMetadata property of the AVPlayerItem. And I would say you’re smart to try that, but no – metadata will always persist for this AVPlayerItem whether it’s on the active segment or not. This means that with the timedMetadata property or timedMetadata events there seems to be absolutely no way to tell if the segment that you are currently playing has metadata on it and if it is an audio only segment.

Ick. Well, what the hell is the point of the image metadata on an audio only stream if it’s all manual and this hard to control. But I needed to persist to get this task done…how can we know when to show this image and when not to?

I tried with AVPlayerItem.tracks. This will expose a track listing for the asset. Seemed pretty good at first – I was noticing that it was showing me I had video, audio, and metadata. Occasionally video would be dropped, and this seemed to coincide with the audio only stream – however this wasn’t always the case. It seemed very flaky – so in the end I couldn’t base things off of the tracks listing.

FINALLY I found the AVPlayerItem.presentationSize. When the stream was audio only, it would indicate that the presentationSize.width and presentationSize.height were 0. And I can use this and ping my video every second to figure out if I should be showing my image to the user that the stream at this very moment is audio only.

What an experience. We’ve gone from the documentation indicating that the feature was automatic to having to wrangle our own bytes, manage everything ourselves, and deal with several weird quirks of iOS. I’m glad I’m drinking tonight.

The worst part of it is, I got no help from Google searches and some limited* help from the Apple dev forums. So I hope this helps YOU!

*My limited help from the Apple dev forums consisted of a very nice developer hailing from Cupertino and another from Maryland saying he was having a hard time too. The Cupertino dev helped out immensely, but not enough because I don’t think I was asking the right questions to get to my inevitable conclusion of suckiness.