Diving into the world of immersive audio

Object audio model
Jon Schorah, Nugen
July 1st 2016 at 11:21AM : By

Jon Schorah, founder and creative director of Nugen Audio, takes a close look at immersive audio and what it means for future sound design and production engineers

New technologies are constantly emerging and disappearing (Google Glass, anybody?). But once in a while a new idea takes hold and offers a genuine move forward in terms of improved quality, new opportunities and widespread interest beyond niche enthusiasm. By those criteria, immersive audio might just be the next new thing.

Broadly speaking, immersive audio is the attempt to generate an audio soundscape that appears to surround and “immerse” the listener in a complete sonic experience. Since the earliest days, the audio industry has been continually striving to produce immersive audio – beginning with the introduction of stereo and now with 5.1 and 7.1 surround technologies. But the current trend is perhaps best defined by the introduction of height perception to generate a “3D” soundscape. It is this three-dimensional audio space that is the source of the latest excitement.

Immersive audio is the natural corollary of parallel developments in visual technologies and resolutions from 1080p through 4K, 3D, High Dynamic Range (HDR) and now VR. In other words, immersive audio is the missing ingredient that (if you will forgive the pun) completes the picture.

Object audio

The immersive audio experience is presently pushing ahead in two arenas – cinema sound and first-person VR experiences. The cinematic experience is all about height and resolution, employing more addressable speakers for specific localisation and realism. In a traditional sound mixing environment, this means more channels – lots more. As such, it would be almost impossible to achieve a measure of backwards compatibility between existing 7.1 installations and new 22.2 or higher configurations if it were not for object audio. An all-new paradigm in sound mixing, object audio bypasses the concept of audio tracks entirely. Instead, the audio engineer positions sound at a theoretical point in a three-dimensional space, building a virtual audio model. There are no prescribed audio channels and translation to the available speaker configuration is handled automatically in software, meaning that the “perfect” model is maintained as the reference mix at all times.

This is a dramatic conceptual step forward, but it will require new tools in NLE/DAW systems for placing and controlling objects and generating the complex associated metadata. Potentially, object audio technologies such as Dolby Atmos and its newly announced rival, DTS:X, mean that a single mix can be universally translated from the cinema right down to the most basic domestic environment, preserving as much of the original artistic intent as possible without the need for individual cinema, 5.1, LCR Sound-bar, or stereo mixes.


A secondary issue (facilitated by object audio) is that of personalisation. Naturally, a personal auditory experience is more satisfying and immersive than a one-size-fits-all approach. Mixes using objects rather than fixed channels can allow elements to be exposed to user control.

Dialogue, for instance, can be made available as an individual object, allowing the user to raise or lower the level for personal levels of intelligibility and comfort. Different objects can be activated and deactivated to allow, for instance, multiple languages or viewpoints to be delivered within the same mix.

New listening patterns

The rise of personal mobile entertainment has also driven resurgence in another form of immersive audio – binaural simulation and recording. As the majority of personal mobile audio is consumed through headphones, this technique is enjoying a new-found relevancy.

One drawback, however, has always been the difficulty in finding a suitable (read personal) Head-Related Transfer Function (HRTF). (For the uninitiated, HRTF localises sound sources based upon binaural cues diffracted and reflected by the head, ear, and torso. Clearly, the more personal the HRTF, the more persuasive the effect.) Ongoing research is seeking to apply modern technologies to solving this issue to bring a truly bespoke service to the individual.

VR convergence

Perhaps the ultimate expression of all these converging developments can be found in the world of virtual reality (VR), which seeks to introduce a dynamic, first-person perspective into the mix. Object audio technology is much better-suited to the VR environment than channel-based audio because the audio model is constantly available for manipulation by the decoding software at the consumer level. For instance, the software can make real-time modifications with regard to the relative location of the listener if required.

New creative and technical questions

However, this burgeoning technology raises as many questions as it resolves. Object audio is nothing new – it has been employed in computer gaming for many years to various degrees of success. The enhanced realism provided by relatively placed and tracking objects certainly increases the sense of immersion when coupled with an element of user control (VR head tracking, for instance), but the vast array of possible outcomes can lead to consumer confusion; e.g. looking the wrong way or missing audio cues. Take, for example, a solution that gives the user control over the crowd volume level, position in the stadium, and commenter/dialogue levels for a football game. How is this to be controlled? Will the user get lost in the interface and ultimately miss key moments in the game while fiddling with the controls in an attempt to get something that “just works?”

Clearly these technologies allow for an immense level of customisation, but there is much work to be done so that the consumer can enjoy the benefits with no more effort than turning on the TV and sitting back to enjoy the show.

Technical complexity is also increasing. We have only just settled on international standards for loudness measurement and control of channel-based audio. How does this relate to object audio? If the user has control over the elements included in the sound track, how is loudness taken into account? How will we stream all these new audio objects within the current bandwidth limits of existing distribution models? Already much work is underway to examine these issues and many others, but perhaps this is the purview of a more technical discussion.

It will require new tools in NLE/DAW systems for placing and controlling objects and generating the complex associated metadata


Consumer realities

For computer/console gaming, the VR headset seems poised for adoption. The gaming world’s years of experience with object audio are slotting into place, driving consumer enthusiasm and demand for a readily understood concept. Several VR gaming packages are already commercially available and the catalog of commercially available VR games is growing rapidly. However, it remains to be seen whether this technology finds a wider adoption beyond gaming and specialist niches. Do we really want to sit in our lounges watching TV “together” while wearing a full VR headset?

Similarly, in the movie theatre, the consumer appetite for immersive audio is strong. Here, the listening environment is controlled, facilitating the delivery of high-quality production values for truly impressive results.

But do these developments readily translate into a gain for the home consumer? One significant benefit of the two leading object audio systems is that a single theatre mix can, in theory, be translated down to even a stereo TV; however, there is little to be gained in terms of immersion at that point.

New speaker developments, including upward firing technologies that remove the requirement for ceiling speakers and specially developed sound bars, may indicate a way forward. These breakthroughs have potential to bring a true immersive enhancement to the everyday consumer.

One thing is sure: immersive, object audio is one of those rare conceptual game changers that has the potential to re-define how we create and enjoy sound. With their scalability and potential for new levels of creativity and realism, these concepts are likely here to stay. How this will pan out for audio engineers, broadcasters, and consumers remains to be seen. But for those of us involved at the creative end, now is the time to skill-up and immerse ourselves in this exciting and rapidly developing technology.