Getting started with live TV subtitling

Mexican flat in TV
John Birch, Screen Systems
Post Production
May 9th 2017 at 10:30AM : By

Some Latin American countries are making it law for broadcasters to provide TV subtitles. John Birch, strategic partnerships manager for subtitling vendor Screen Systems, offers some transition strategies

The provision of television access services, known as captions or subtitles for the hard-of-hearing (HoH), is increasingly being mandated around the world, with the most recent wave of legislation being rolled out significantly across Latin America. Brazil, Argentina, Peru and Mexico are all currently in the process of implementing systems to meet new legally required levels of captioned content on their broadcasts.

Mentioned only as an observation rather than intending any criticism, a common theme has become glaringly apparent throughout Screen’s four decades of supplying subtitling and captioning systems: that captioning is often a final consideration in channel setup, one that is quite often dismissed as a far more straightforward process and implementation than it is in reality.

In a similar vein, countries that have already provided some degree of captioning but are subsequently obliged to step up to the plate when tougher legislation and standards come into effect are typically surprised by unanticipated challenges over and above the initial technical application.

The challenge of live

In the UK and the rest of Europe there is a broad availability of captioned content, with a high percentage of broadcast output carrying captions following no particular genre. In contrast, as is often typical in the initial stages of captioning in a region, it has become apparent that Latin America is focussing principally on captioning live and current affairs programming.

This news and live content is possibly the most understandable place to start, providing access to content of general public interest at an acceptable level. Unfortunately, it’s also the most challenging workflow as the time available for caption creation between the filming of the content and broadcast is very short or in the case of live broadcasts, zero. Particularly for live captioning, judicious consideration must be paid to the initial specification and application of a solution in order to future proof and thereby protect one’s investment, in preparation for the roll-out of the captioning of non-live content (pre-prepared/offline) further down the line.

Screen have designed specifically with this two-phase roll-out in mind and have developed our caption delivery systems to be identical for all forms of captioning, allowing the use of the same infrastructure for both live or non-live captioning regardless of which form is the initial requirement.

Okay, so what are the operational challenges mentioned with regards to live captioning?
Firstly, it may be helpful to briefly explain the processes involved.

Cutting the ums and aahs

In brief, live captioning is the near instant creation of text that accurately conveys the meaning of speech in a live broadcast. There is often a focus on preserving exactly what is spoken – a verbatim approach, but practical limitations typically result in some precis of the speech, for example the removal of unnecessary repetition and meaning free speech noises (the ums and aahs). What is beyond dispute is the desire to accurately convey the meaning, and tone of the live speech without distorting the original intent. Censorship of swearing used in the original speech is a highly contentious topic in captioning, possibly because profanities in text seem more offensive than in speech.

Clearly, producing such text is an intensive task and traditionally involved fast typists, or operators of special stenographic keyboards as used in courtrooms. With an ever increasing demand for live captioning, increasing the speed, efficiency and accuracy of the creation of live and news captions has been the ultimate aspiration for subtitle and caption technology developers. Although still falling a little short of being perfect, the creation tools designed for this task are getting ever closer to accomplishing this goal.

It has become apparent that Latin America is focussing principally on captioning live and current affairs programming

Stenographer vs. respeaker

A significant technological breakthrough came with the advent of high-accuracy speech-recognition engines to caption creation software. The traditionally used fast-typers and stenographers are highly skilled people, who have undergone extensive training, and using them is justifiably expensive. The volume of live television captioning has increased to a point to render the use of stenographers prohibitive as they are also an increasingly rare resource.

Introducing speech-recognition into the creation tools brought with it the new role of the re-speaker or voice-writer. A re-speaker can ordinarily achieve 98% accuracy after just 3 months training in live captioning.

As with the stenographer, re-speakers are trained in the nuances of writing and editing captions but instead of physically typing them, they listen to the narrative, mentally edit it ‘on-the-fly’ and literally speak their intended caption text into a microphone. The speech-to-text engine in the captioning software is pre-trained to the re-speaker’s voice and ‘macros’ or shortcut keys can be used for the insertion of uncommon words, names and punctuation. The proposed caption text is displayed for any last-minute correction or adjustment before release to the transmission system.

The first challenge is that it is likely that currently there is not an abundant supply of re-speakers in Latin American countries and despite being a far less timely and costly learning process than that of a stenographer, re-speaking does albeit necessitate specialist training.

Censorship of swearing used in the original speech is a highly contentious topic in captioning, possibly because profanities in text seem more offensive than in speech

Fitting the workflow

The second challenge is that of newsroom integration. Adding captioning into an existing newsroom environment is a potentially very disruptive process to established workflows and ways of working. It is vitally important that the newsroom captioning department have as much advance information on running order and stories as is possible before they are broadcast. For example, they may need to set up macros in the speech recognition system in order to create captions with correct spellings of the names of featured individuals or places.

Having been involved in many digital TV transitions across the globe, Screen has considerable experience and has become competent in predicting and successfully responding to the obstacles and complexities that present themselves in making captions (and subtitles) work correctly across the raft of new and legacy set-top-boxes and customer based equipment.

Another common scenario supported by Screen systems is the conversion of existing caption services within contribution content. From a downlink site perspective, captions may already exist on the contribution content, but a common issue that is likely to be encountered is the requirement to convert the captions into a different format to suit the outgoing re-broadcast. This usually also requires the retiming of the captions to compensate for any delays introduced by the re-encoding of contribution content. A more involved issue may occur when the incoming caption content is not fully standards compliant and needs to be ‘legalised’ for correct operation.