A future in space: Interview with iZotope's Mark Ethier

Izotope - Mark Ethier
Neal Romanek
March 24th 2016 at 2:44PM : By Neal Romanek

iZotope has become key player in audio software. TV Tech Europe talks to the company’s CEO and co-founder, Mark Ethier, about the changing world of sound recording and editing and the potential for the future

In the past year, iZotope has won several awards, including a TV Technology Best of Show award at NAB for RX Loudness Control software. What has been the source of the company’s success?

I would say that we have tried very hard throughout the years to provide the world with new stuff – with new technology. And we’re constantly trying to push the bar higher with not only the quality of the audio signal processing but the actual control and visualisation of it, so you have the ability to really understand the sound you’re looking at in a very intuitive way. And we try to wrap that up with a really strong customer support team.

How has automation helped sound editors and engineers improve their workflows?

One of our philosophies is to focus on how to help people be creative. There are two prongs to that. One is creating tools to allow them to do things that they might not have been able to do before in a creative space. When it comes to TV or film post production, Iris for example, is one of our musical instrument software products, but it’s been used very much in sound design on the creative side.

The other prong is how create automated workflows to help with the “commodity stuff’, the work that doesn’t require any real creativity. You have to get it done, you have to get it out of your way – you want the plosives gone, or you want to be able to remove distortion, or remove clicks and pops, or shape the background noise so that the room tone is consistent throughout a take.

We try to enable sound designers and engineers to focus on the creative side rather than have to get bogged down in the tedious mechanical tasks. I’ve seen people spend hours and hours modifying the levels of a dialogue track. And it’s satisfying to be able, with virtually a click of a button, to let them to easily modify breath sounds or plosives or sibilance. Some of those technologies get to be pretty complex. There’s a lot of smart user interface and processing that makes that possible.

With schedules getting squeezed, a lot of these things become more of a challenge. As the time goes down, we don’t want the quality to go down. We hope the quality goes up as people have less time to work on things.

There are a lot of lessons we can learn from video game development

What are some of the big problems people are looking for solutions right now?

Sometimes we will do observations of people working. And there are times when we are able to solve problems that people weren’t originally trying to find for solutions for. Part of it was they didn’t even know the problem was possible to solve, so they weren’t asking if they could do it. I think there’s some automatic correction in RX that’s a good example for that.

There are still a lot of unexplored issues that people are looking for solutions for. One big one is placing things in space. It’s still a big issue for engineers. How do you take dialogue or effects and put it into a space so it sounds realistic. That’s an area that we’re exploring a lot now. We see that as being the next frontier. You’ve put it together, now how do you present it in a way that’s going to sound as realistic as possible.


What are some of the potential challenges with spatial audio and object-oriented sound?

One of the biggest challenges in that space is going to be user interface and control. We’re so used to working in two-dimensional formats from an audio and positioning perspective that changing everyone’s mind set to not creating a static mix but placing things dynamically in space – and the question of how you control those at any moment – is a challenge. How do you actually visualise that? How do you control it? What is the user interface?

The video game industry has been dealing with this problem in many ways, and there are a lot of lessons we can learn from video game development. iZotope has done a bunch of work in licensing technology working with video game manufacturers on game audio and we’ve learned a lot about how they think about sound. It has to be a lot more object-oriented. There has to be a more dynamic way of thinking about it. There are a lot of lessons there that we can pull over.

Thinking about sound in terms of stereo is partly a limitation of what the technology has been

Do you think object-oriented sound become the standard way of thinking about audio?

I like to have the idea of starting with a clean slate – don’t think about how things have been done before. If you were an alien that came to this planet, how would solve this problem. This goes back to the first product we released for sale, Ozone. Back then a lot of the software being created was deliberately trying to model the hardware that was being used – all the way down to it looking like hardware and users interacting with it like hardware. So we came in with a blank slate and brought in things like undo and rich visualisations. We decided to think about how it should be and not get too hung up on the history.

If you clean-slated everything, and said there’s not stereo, there’s no 5.1, I think spacial sound is what you would come up with. When people talk about sound, they talk about placing over here or back in the background or some point in space or something up above moving closer to me. They think about it those terms intuitively. As opposed to saying: “I want ten per cent of the sound with this profile in the front right channel”. Thinking about sound in terms of stereo is partly a limitation of what the technology has been.

I think the idea of object-oriented sound will stick, because it’s actually more intuitive. And the actual distribution becomes much easier across different formats. I get excited, from a broadcast perspective, with instead of using all the bandwidth in a broadcast to support multiple mixes, you would have just one stream which is a positional stream, and then have more dynamic interpretation at the point of being rendered with consumer.