Tuesday, June 4, 2013

3D Mastering in FFmpeg

Input

The default image file format for a high-end VFX workflow is Lucasfilm's OpenEXR (.exr). In addition to the standard .exr file, a stereo (3D) extension to the EXR standard called SXR exists, which is basically a container for both the left and the right eyes within one file. (Saves the data managers having to worry about 2 sequences of files per each stereo stream). Unfortunately, at the time of this writing, OpenEXR is not very well supported in ffmpeg (despite there being claims of OpenEXR support) and SXR is definitely not supported. This means that the first part of the workflow must be a file conversion from EXR to either DPX or TIFF (or lossless JPEG).

Describing this conversion process is outside the scope of this guide, but there are many ways to skin this cat. The easiest one is to use Nuke to create a reader node for the SXRs, attach a writer node to it and have it write out a sequence of converted frames. The one thing to keep in mind is that many VFX workflows work with a color LUT to manipulate the look of the resulting images. FFmpeg does not have a way of applying a text-based LUT to its inputs so the LUT must be applied during the conversion process. Once we have a sequence of DPX/TIFF/JPEG/whatever images, we can proceed with encoding them into a movie clip.

One thing to note is that the high dynamic range of EXR (and 16bits per channel) will be "flattened" once the frames are converted to DPX or some other format, but this is OK because the best most video codecs can do is a 10bit depth per channel anyway.

Output

FFmpeg will take an image sequence on the input and will then output a movie clip file (usually in a .mov or .avi container). The VFX/film industry would mostly use the .mov container, especially for client deliveries, because it is the most commonly used format in the industry. Making these clips is pretty straightforward, but we'll look at the options that are of interest to this specific industry the most.

Mono vs. Stereo

The current trend in filmmaking is geared towards producing 3D (stereo) movies. In terms of a file container, a stereo movie is simply a single movie file (like .mov) that contains 2 video tracks - one for each eye. When choosing an output format one must make sure that it supports multiple video tracks (quicktime does, not sure about .avi). Creating a stereo movie requires two extra steps. The first one is that the correct input source must be specified for each eye. This basically means replicating the input path and any parameters for the 2nd eye. The second and most important step is to map both of the input streams into a single video stream. This is done using the "map" filter that comes with ffmpeg and by passing it the following parameters:

-map 0:0 -map 1:0 -metadata stereo_mode=left_right

This tells ffmpeg to take input stream #0 and input stream #1 and map both of them into output stream #0. It is possible to control which eye gets assigned to which track by changing the order of the -map arguments. Once a movie like this is opened in Quicktime, it will show as having 2 video tracks. It is then entirely up to the player that is used for playback to determine how to display this movie in 3D. RV for instance does it by default - all that needs to be done is to turn on stereo mode, but other players may require more tweaking. The metadata tag is potentially optional, I have not tested what happens if it is omitted.

From http://ffmpeg.org/trac/ffmpeg/wiki/vfxEncodingGuide