By: Michael Whiteside (Founder of Studio Connections)
Michael gained degree in electronics and has since worked in professional audio for over 25 years as designer, consultant and recording engineer, builder and corporate managing director.
When we are listening to sound, the spatial awareness we have and the accurate position of a sound source is derived by a dedicated neural network in our brains that measures the very small time differences between sound reaching the left and the right ears. This is the result of many millions of years of evolution.
For accurate perception this neural network measures the time difference with very fine precision. A sound source about 20m in front of a listener that moves about 5 degrees to one side will cause a time change between the sound reaching the left and right ear that is as small as 5us (5 millionths of a second). It is by detecting these minute time differences between a sound arriving at each ear that the brain works out where sounds are coming from.
Some animals, such as bats and owls, have this faculty so developed it detects time differences down to tenths of a millionth of a second- that is in the realm of 10Mhz. This enables them to catch prey from a distance, and owls can even do so when blinded.
This means that, in terms of high quality replay systems, although we hear in the realm of 20Hz to 20kHz for the spatial image (depth and breadth) to be reliably and accurately reconstructed by our brains, we need the sound waves to be delivered to left and right ears by a system with all the tones having correct timing relative to each other to within 5us of accuracy. This is called ‘phase coherence’. In other words, in order to reproduce authentic sounds requires a system to have all its components true in timing to within a few millionths of a second.
When sound is replayed without this precision in timing, our brains stop perceiving the depth of audio image and we hear a flatter, simplified left-right stereo image. This is because when the time sensitive neural network is not receiving enough accurate and reliable time information, it stops working and our brain then uses less precise cues to approximate left/right image; such as the difference of intensity of the sound at each ear. We are unconscious to this approximation taking over because our mind compensates and automatically fills in an extrapolated image from whatever information it can get.
The process of our mind calculating a sense of depth from limited audio cues is similar to how it derives depth from two dimensional visual images. Again, it approximates by extrapolating from secondary cues, such as perspective. We enjoy normal two dimensional films by imagining the sense of depth, but when a stereoscopic three dimension image is viewed, the left and the right eye derive depth from the tiny differences between the angles of each eye.
With both visual and audio recordings, it is only when we experience replay in three dimensions that we appreciate the lack of depth detail that the comparative 2 dimensional rendition has.
So, to summarise, if precise timing differences between left and right audio signal are not accurately maintained, the brain cannot construct the sense of depth that we would naturally hear. When a system presents stereo sound with coherent timing information between the left and right, then the brain can derive the sense of depth. This contributes to a very strong sense of realism, authenticity and ultimately musicality to what we are listening to.
Musical reproduction and full image depth requires the timing integrity of each component in the system to be accurate. But it also requires skilled microphone placement and good record engineering in the first place. Recording engineers place microphones carefully so as to capture image information in a recording. Companies such as EMI, Decca and Phillips even have a specification on how to place the main stereo pair of microphones in order to capture the most coherent stereo image.
The precise timing of the audio waves arriving at each microphone is the information that needs to be captured and kept intact from mastering to reproduction in order to replicate the recording to sound like the original performance. It is often during a poor mastering or transcription process that a good recording can be degraded because of lack of care in preserving the image critical timing.
HOW SYSTEMS DEGRADE PERFORMANCE IMAGERY AND MUSICALITY
Factors that cause timing integrity to be lost and the consequent sound field degradation are:-
- lower performance cables
- non-ideal ground and mains supply infrastructure
- mechanically resonant components
- system components with undesirable electrical characteristics
- induced currents and excessive floor noise
All these things contribute to loss of image and realism by affecting, altering and modulating the fine timing details. Ground and power related noise saturates and thus masks subtle image cues. Cables have a surprisingly big effect as they can introduce transmission delays that differ with each frequency.
Depth and realism of reproduction is very apparent and appreciable when a good system has been set up for peak performance. Reduction of the system noise floor with ground topology and mains supply practice contribute to realistic, musical reproduction. However, cables with a wide bandwidth, good phase coherency and fast response that can move large currents without throttling sound is absolutely essential to a system reproducing like the atmosphere of the original performance.
Systems can replay with extraordinary depth and realism from the spatial ‘room cues’ that were captured during the recording process.
If you have not experienced sound as if it is being performed right in front of you, then I would not be surprised. Many systems and most cables are simply not designed with this level of timing precision in mind. And like any chain, one weak link will lose the vital information.
When things are right the sound doesn’t seem to come from a flat left/right plane, but from a cloud, as if being performed in front of you. The listener instinctively relaxes rather than tense up and analyse the sound. This is because things sound natural and the brain is clear about where sounds are coming from and not confused or panicked because it cannot satisfactorily locate sounds. And it really can sound spookily like the performer is in the room.