About a decade ago, at the dawn of on-demand printing services like Cafe Press, I had a colleague who wore a t-shirt that read "VFX R Hard", which incorporated the design of the infamous Toys-R-Us logo to make fun of the common misconception that VFX work simply involved "pushing buttons to make the dinosaurs look real".
The fact is, visual effects and VR360 production share many things in common: file based workflows, stereoscopic pipeline considerations, specialized hardware and software, technical integration challenges, and the reliance on teams of talented artists capable of leveraging all these resources to fulfill aesthetic and messaging requirements. Taken even further, VR360 production adds the complexity of specialized audio formats, as well as consideration for UI and UX. (Although this post is dedicated specifically to 360 video). To make matters worse, the nascent industry lacks a common delivery format, which represents a critical roadblock to mass adoption.
Enter the format wars, where there are no less than 49 possible ways to encode and deliver a piece of content to viewers. Don't believe me? Consider the following decision tree when building an encoding worfklow.
If the decision tree is too big to visually sink in, then think of the simple math:
There are seven "common" options for formatting 360 video:
- Top-Bottom Sqeezed
- Bottom-Top Squeezed
- Left-Right Squeezed
- Right-Left Squeezed
- Double Height
- Double Streamed (*)
And there are seven "common" options for encoding the audio:
- Stereo (headlocked)
- OZO Audio
- 1st Order Ambisonics
- 2nd Order Ambisonics
- 3rd Order Ambisonics
- Facebook 8.0
- Facebook 8 + 2 (8 spatial + stereo headlocked)
Thus, 7 x 7 = 49 possible outcomes.
To be fair, many of the picture formats are not regularly used, nor are many of the audio formats. So it is reasonable to argue there are only 4 "primary" formats for picture or 4 for audio. Nevertheless, that still means 16 options not including considerations of frame rate, bit rate, resolution, audio sample rate, audio channel order, metadata, and other requirements for 6DoF, MR and AR.
Hard for clients who dont have the time to consider the sheer number of variables but only want to reach the largest audience possible.
Hard for viewers who, except for early adopters and enthusiasts, barely understand the difference between a flat-screen display and a 360 experience.
And hard for producers and creatives who must navigate these murky waters.
And to cap it off, the only format that is compatible across the 3 largest platforms (Facebook, Youtube, and Oculus (including GearVR)) is a monoscopic picture with stereo audio.
Ouch. Think of what what means.
It means that there is an inherent friction within an industry that wants to drive an industry forward with optimum quality experiences, at the expense of quantity, and the practical business reality of needing to reach the highest quantity of viewers, even at the expense of quality. Caught in the middle is the production community which is effectively left to "create the best but accept the worst".
If that sounds pessimistic, take heart.
There is hope for higher standards to evolve across all platforms. Until then, producers and creatives need to remain forward thinking and take steps to maintain quality while also reaching the highest quantity of viewers. This means designing flexibly pipelines, understanding the various delivery standards (and the experiential tradeoffs among them), and being able to communicate these factors to your clients.
Not an easy task, but with best practices from existing workflows, and solid teams capable of adapting to new paradigms, it is possible to delivery quality and quantity.
* I've completely made up the term "double-streamed" here to denote the "2LR" format which involves full resolution image for each eye, delivered across two separate encoded streams.