Finding the right words for subtitling workflows

Dalet subtitling diagram
Ben Davenport, Dalet
Acquisition
April 21st 2016 at 11:23AM : By

Ben Davenport, director of marketing at Dalet, takes a look at the challenges in subtitling for a multi-screen world

Even the language around captions and subtitles is confusing. And perhaps that's the best place to start. Over the other side of the Atlantic, in America, they refer to the timed text that accompanies video/audio as captions - specifically, open captions or closed captions. Open Captions are “burnt” into the encoded/distributed video picture, where Closed Captions are carried alongside and the display device controls the rendering of the text over the image when selected by the user. And ‘subtitles’ refers only to foreign language text.

In Europe (and elsewhere) we tend to just use ‘subtitles’ to refer to all cases. For the purpose of this article, it is only subtitles, and specifically the European equivalent to Closed Captions, that we will discuss.

Why we care

Aside from being ethically sound, in many territories, providing subtitles is a regulatory obligation – as such it is a business cost that must be addressed with maximum efficiency. In a time where all or most content was delivered to consumers by linear broadcast, the most effective time to add subtitles was at playout – before which all the subtitle data would exist in a discrete workflow and data path. This posed some challenges in terms of association of text with video/audio but avoided the much larger issues of media interoperability and file exchange. However, when we start delivering content through multiple platforms, it is not efficient to have separate subtitle data paths for each platform, and we need to look at how to bring text, video and audio together much earlier in the production and content preparation process.

Authoring formats

STL, SCC, PAC, RAC, CHK, SRT, SUB, 890, XIF, CAP, TXT – a few of the most popular file formats for authoring, editing and storing subtitles standalone from the video and audio. Standards and specifications exist for some of these but there are many proprietary implementations too, meaning that converting archives of these files to something interoperable often poses a hurdle in the design or implementation of introducing subtitles into file-based video/audio workflows. Well established tools exist to handle these conversions, but it is important to ensure those conversions can be easily orchestrated and married to other operations to maintain efficiency.

Dalet subtitles inputs outputs

Carriage formats

Although there are other alternatives, by far and away the most common carriage formats for subtitles are OP42 (Operational Practice as defined by Free TV Australia) for SD and OP47/SMPTE RDD08 for HD. These specifications describe the way in which the text is actually written into the ancillary data space (VBI or VANC) in file or stream form. While there is opportunity for error in encoding/decoding to/from these formats, this is a reasonably well understood area and errors probably are rare.

It is not efficient to have separate subtitle data paths for each platform

Carriage mechanism and media file formats

Although by no means necessarily the same thing, the carriage mechanism and media file format are very much related. By carriage mechanism, we are referring to the container for the carriage format. For MXF files this will commonly be an ST436 track – a track that contains ancillary data according to the SMPTE ST436 specification. However there are many other frequently used containers such as MPEG-2 user data space, ‘in-vision’ space in IMX (tall MPEG) or VAUX in DV. Some vendors created proprietary side-car files to contain ancillary data while some proprietary codecs (such as ProRes) have specific containers for captions. Regional application specifications such as AS-11 UK DPP in the UK and the ARD-ZDF MXF profiles in Germany should go some way to constrain options and aid interoperability, but limited vendor support along with subtleties and variations in the implementation of those specifications can still cause issues.

Understanding the carriage mechanisms and different media file formats is especially important when dealing with legacy files with embedded subtitle data and/or files arriving from third party sources. Whether you are looking to extract the subtitles from these files for reference or manipulation, or converting them to a mezzanine format – knowing what subtitles are in the source files and where is the difference between success and a potentially untestable workflow.

Pivot Formats

As with any ‘many in, many out’ scenario, the test matrix, and chance of error, can be greatly reduced by using a mezzanine or ‘pivot’ format at the centre of your workflow (see image top) – converting only to that format on the way in and only from that format on the way out. The best format to choose will depend on your workflows – if you’re significantly manipulating or editing the subtitles then an STL or TTML file might be preferable, in repurposing and delivery workflows, a standalone (reference) ST436 MXF file could speed up transformations for delivery – but generally something tightly defined, specified and/or standardised will be advantageous.

As stated at the beginning of this article, even the language around captions and subtitles is confusing and there are possibly terms and acronyms used here that are unfamiliar or different to those you may have used. This makes the most important factor in any conversation or project around subtitles to define a vocabulary from the start and minimise confusion.