You shouldn’t believe that hi res is better hi fi. Our ears aren’t hi res enough. Warning: this post gets technical. blog.mindrocketnow.com
I blame the school AV club for my enthusiasm for Hi Fi. Cassette tape was the currency for music, and mix tapes an extension of personality. Everything was analogue, and so everything could be tweaked. A little bit of bias there, some fatter cable there, and we can lower the hiss so that we can crank up the volume a little bit higher.
Fast forward a few years (decades), past a fad or two, and we arrive at today’s digital revolution, where nothing is owned any more, and nothing is physical any more, except for the emotional responses we have to a particularly evocative track.
It’s worth dwelling on that for a moment. Digital, in all of its ones and nought-ness, is pristeen, exact, antiseptically clean. In university I was taught Shannon’s (or Nyquist or Whittaker or Kotelnikov) theorem which states that any waveform of maximum frequency f, can be measured without any loss of information in 1/(2f) samples. Therein lies the problem: anything that purports to be an exact likeness surely cannot communicate emotion. It’s like expecting to have a conversation with a photograph. And yet music does.
OK, step away from emotions, I’m an engineer after all. For this blog, I want to examine high resolution audio, and the meaning behind the numbers. The audiophile is being told that CD quality isn’t good enough, that due to the emaciated specs of the CD Red Book standard, we are missing out on detail. The key metrics are bit depth and sample rate. CD specification is for 16 bit and 44,100 samples per second (abbreviated to 16-bit/44.1 kHz or just 16/44.1). Audiophile is now 16/96, 24/96 or even 24/192. Obviously, bigger numbers are better? Except, they’re not, and could even be worse than CD. Ooh, controversy..
(I should pause now for a disclaimer: I’m an engineer by training, therefore educated as an empiricist rather than a rationalist. However, I’m a consultant by career, which is characterised by the ability to make decisions rationally rather than empirically. I therefore fall between the two camps of: It sounds better therefore it is better, vs It isn’t measurably better therefore it can’t be better. But my wallet is ruled empirically, so I’ll only spend money if I can be sure it is better.)
Let’s start with examining sample rate. Shannon’s theorem, mentioned earlier, is a very useful starting point. By common agreement, the limit of human hearing is taken to be 20 kHz. (In fact, if you can hear 20 kHz, then you’re probably very young or very exceptional – I can only hear up to 14 kHz ish thanks to age and Iron Maiden, and you’re probably no different. Except for the Iron Maiden part.) So Shannon tells us that the full range of audible frequencies can be reproduced without any loss of information using a sampling rate of 40 kHz. CD, with a sampling rate of 44.1 kHz, should be of completely satisfactory quality.
(You may be wondering why an extra 4.1 kHz of samples was added in the standard. Like many technical decisions, this appears to be arbitrary; Sony wanted it, and Sony prevailed, presumably because they’d figured out how to make an anti-aliasing filter with 2.05 kHz transition band more cheaply than anyone else.)
So why are audiophiles moving to higher sample rates, as much as 192 kHz, in order to reproduce a frequency range four times higher than the limit of human hearing? An argument is set out by Ian Shepherd of Production Advice:
But a fair few musical instruments produce sound well above these frequencies [20 kHz] – muted trumpet and percussion instruments like cymbals or chime bars are clear examples. This leads to two potential objections to a 44.1 kHz sample rate – first, that in order to reproduce a sound accurately we should capture as much of it as possible, including frequencies we probably can’t hear. There are various suggestions that we may be able to somehow perceive these sounds, even if we can’t actually hear them. And secondly that depending on the design, the anti-aliasing filter may have an effect at frequencies well below the 20 kHz cut-off point.
A reasonable statement, that just happens to be wrong.
There’s a great article by Monty on Xiph.org that explains why high sample rates can actually degrade music reproduction:
Neither audio transducers nor power amplifiers are free of distortion, and distortion tends to increase rapidly at the lowest and highest frequencies. If the same transducer reproduces ultrasonics along with audible content, any nonlinearity will shift some of the ultrasonic content down into the audible range as an uncontrolled spray of intermodulation distortion products covering the entire audible spectrum. Nonlinearity in a power amplifier will produce the same effect. The effect is very slight, but listening tests have confirmed that both effects can be audible.
In other words, the act of trying to reproduce high frequency range will cause audio imperfections introduced by equipment not designed to reproduce them. So either buy more expensive equipment that can reproduce up to these ultra high frequencies perfectly, and then ignore them because the human ear cannot hear them. Or just don’t bother, and stick to the good enough CD quality.
Bit depth is the second critical number for audiophiles, which inflates in high-resolution recordings. CDs are at 16-bit depth, which gives a nominal signal to noise ratio of 96 dB (RMS, i.e. the noise energy spread across the whole frequency range). To put into context, the dynamic range of CD is enough to go from the loudness of a light bulb to that of a pneumatic drill.
But psychoacoustic tests tell us that humans can discern sounds that are 18 dB quieter than a light bulb (in the absence of ambient noise) at around 3.8 kHz. To reproduce the total range of sub-bulb threshold of hearing to threshold of pain loudness we need around 120 dB, so we need more bits, right? As Bob Stuart of Meridian Audio points out, 20 bits should be the target bit depth to give this dynamic range.
Well, no. Firstly, it’s dumb to play music at the threshold of pain. Secondly, the reason you can’t hear light bulbs is because the level of environmental noise, the noise floor, is more than 28 dB louder than the limit of human hearing. So the practical limit-to-limit dynamic range is more like 90 dB.
Thirdly, the dynamic range of 96 dB isn’t quite correct, since the human ear doesn’t hear in terms of RMS sound pressure. The human ear consists of a large number of pressure sensitive hairs, tuned to small frequency bands. Because the frequency bands are so small, the hairs react to a fraction of the noise floor energy. In fact, 96 dB RMS turns out to be around 120 dB in the frequency band of each hair. That’s the difference again between sub-bulb and pain of the previous paragraph (funny that). So 16 bits gives us 120 dB of perceived dynamic range, which is more than enough.
(What about quantisation error, you cry? To clever-old-you, I say nonsubractive dither, and leave you in the care of Google Scholar. If you would like a dumbed down version: Quantisation errors arise because the digital sample isn’t precisely the same value as the analogue source. However, using the digital technique of dithering, the quantisation error is made incoherent, i.e. it doesn’t impact the signal, instead is moved out of hearing frequencies. Therefore the impact of quantisation error is greatly reduced.)
So if the psychoacoustics do not support the supremacy of 24/96 then why do audiophiles claim it sounds better? I think part of the reason is that humans are wired to be rational rather than empirical, to be biased by their opinion rather than to be an empty vessel, and therefore it’s inevitable that belief comes into it – I believe it sounds better therefore it is better. And also sometimes, re-mastering for 24/96 goes back to the studio recordings and creates an improved source sound, and therefore quality improves – which would be true regardless of bit rate.
But I also think that, rather like the megapixel war with digital cameras, bigger numbers should be better, and coincidentally dealing with these bigger numbers (more bits at a higher rate) justifies spending money on expensive kit to do the job properly. And audiophiles like expensive kit.
There’s also a less technical reason, a historical hangover, to why we might have these numbers. Thirty years ago, digital technology wasn’t as good as it is today. For example, filters were much better quality in the analogue domain than digital. So music was recorded at high sampling rates, 96 kHz often used, to provide an ample transition band. This technique is called oversampling. Oversampling is still an important technique used in good 16-bit CD players.
On a similar theme, 24-bit depth was often used in recording to provide a much lower than needed digital noise floor. Once digital effects on top of digital effects were added, the noise floor inevitably increased, but because of the bit depth used, still didn’t impact upon the signal quality. The bit depth was reduced to 16-bit at mastering. Nowadays, 32-bit recording is becoming popular instead of 24-bit, because 32 bits are more efficient with physical computer memory and CPU (most computer architectures are 64-bit nowadays, which is a nice multiple of 32 bits).
It still doesn’t follow that there’s actually 24 (or 32) bits or 96 (or 192) kHz worth of audible information, even if the studio recordings were in 24/96. Empirically, we aren’t able to discern the difference, so it’s better to focus on acquiring lossless files and the best mastered recordings. The fact that US federal archives requires 24 bit depth because “16-bit audio, the CD standard, may be inadequate for many types of material” is just another high profile conflation of what is technically feasible with what makes an actual difference. This is the essence of audiophile snake oil.
Oh, and rationally, it’s better to just enjoy the music.
I’m glad you’ve stuck with me so far. There are yet more complexities, and more acronyms, to unravel in this audiophile journey. Coming in a future post…
More in this series: part 1, part 3.