Hiding a mic in Scarlett Johansson's t-shirt - and other recording tips
Eddy B. Brixen, audio specialist at DPA Microphones, offers tips on recording the most important part of any show – the words
The most important sound source on TV is the voice. However, sometimes it is a challenge to obtain sufficient speech intelligibility. This article directs the attention to some issues when capturing and reproducing the spoken word.
The spoken word is still necessary for communication – even on television. Getting the message out is what counts. However, just transmitting the sound of the voice is not always enough to provide good speech intelligibility at the receiving end.
The human brain wants to create meaning out of most sensations. If you listen to a person speaking but miss a word or two, you try to link up the words that you perceived correctly. Luckily enough, language often provides redundancy. Missing a word or two in a sentence doesn’t necessarily destroy the meaning. However, we should not take that for granted.
Another important issue: as humans get older their ability to hear and their ability to distinguish one sound from another diminishes. Also, people who over time have been exposed to high sound levels (for instance from personal audio devices!) may experience problems perceiving speech. This can easily be the case for many target audiences.
What in speech makes it intelligible?
Speech consists of vowels and consonants. The vowel sounds are generated by the vocal chords and filtered by the vocal cavities. The consonants are created by air blockages and noise sounds formed by the passage of air through the throat and mouth, particularly the tongue and lips. It is the consonants that are responsible for speech intelligibility.
The speech rate (number of words per time unit) performed by journalists, anchorpersons, and presenters is much higher nowadays, compared to the early days of television. As listeners, we do not have time to repeat to ourselves what was just said. Pauses in speech are almost nonexistent. So every word needs to be intelligible!
If the consonants are too weak or masked by noise/background sound, intelligibility drops. Thus keeping background sound at a distance and getting a sufficient amount of consonants is a necessity. Compression may help to emphasise consonants but if the noise is compressed as well it is not of any help.
There are basically two ways to improve signal-to-noise ratio. In noisy environments, get the microphone closer to the sound source (the mouth), or apply a microphone with higher directivity.
In speech recordings, the 1-4 kHz frequency range should always be “kept clear”. When, for instance, adding music, clean sound, or sound effects as background for narration, a parametric equaliser cutting the music 5-10 dB in this frequency range improves the perceived intelligibility. In multichannel presentations, the perceived intelligibility is increased if the speech and the noise come from different directions. However, if collapsed to mono, precautions must be taken to retain intelligibility.
The right tool for the right job
Choosing the right microphone and using it in the right way is always the best starting point. If done correctly, there is nothing to repair afterward.
Lavalier microphones - In many applications, the preferred microphone is a lavalier type (positioned on the chest), which allows for greater freedom for the user. However, if a microphone with a flat frequency response is mounted on a person’s chest, the 3-4 kHz range should be boosted around 5-10 dB just to compensate for the loss in the chest position. Use a microphone that is pre-sized to compensate or remember to make the right equalisation in the editing process. Note that no ENG mixers or cameras automatically compensate for this and no controls are provided to do so. In many cases this is never compensated for. Hence, intelligibility is often low.
For drama, the microphones can be hidden on the body, in the hair and in other bodyworn places. The image below shows a lavalier mic placed inside Scarlett Johansson's t-shirt for Luc Besson's film Lucy. In addition to careful placement, every new position may need its own equalisation.
Headset microphones - The level at the headset microphone is approximately 10 dB louder compared to a chest position. The spectrum is less affected compared to the chest position. However, to some degree, a high-frequency roll-off has to be compensated for.
Handheld interview microphones - Handheld microphones should be held in front of the mouth within an angle of ±30°. If using a directive microphone (cardioid-type or shotgun), it should be addressed on-axis (and not like an ice cream cone). Too dense windshields may reduce higher frequencies and provide less clear consonants. Remember to compensate for this.
Boom - When booming, the most neutral spectrum is obtained when positioning the microphone in front of and above the head. If the surroundings allow it, microphones other than shotguns may be the best solution.
Podium / News-desk microphones - Permanently installed microphones are positioned at various distances from the person speaking. Hence, the microphones should be directive, especially in the frequency range above 1 kHz. The microphone must point at the mouth of the speaker. Microphones mounted on podiums/desks should not be sensitive to vibrations or any handling noise.
Beware your own "audio white balance"
Optimising speech intelligibility is, of course, more than handling microphones. Many routines in TV production should be taken into consideration. Example: a video journalist prepares an interview. Questions are presented to a person and answers are recorded by the camera. The journalist moves to the editing suite (or to his van) to check the recording. Enough footage? Images OK? Sound OK? Then the journalist does his edits. The problem is that after having formulated the questions and having heard the recorded answer a couple of times, it is no longer possible for him to objectively assess whether the recorded voice is intelligible. Moreover, when browsing through his footage, he unconsciously accepts the timbre of the sound recorded. He creates a kind of “audio white balance”. At the point where he is going to assess the sound quality, it sounds absolutely fine, and every word is intelligible – to him. But not necessarily to others.
Best advice: Choose good microphones, use them correctly and make sure to treat the speech signals in the right way.