You shouldn’t believe
that hi res is better hi fi. Our ears aren’t hi res enough. Warning: this post
gets technical. blog.mindrocketnow.com
I blame the school AV club for my enthusiasm for Hi Fi.
Cassette tape was the currency for music, and mix tapes an extension of
personality. Everything was analogue, and so everything could be tweaked. A
little bit of bias there, some fatter cable there, and we can lower the hiss so
that we can crank up the volume a little bit higher.
Fast forward a few years (decades), past a fad or two, and
we arrive at today’s digital revolution, where nothing is owned any more, and
nothing is physical any more, except for the emotional responses we have to a
particularly evocative track.
It’s worth dwelling on that for a moment. Digital, in all of
its ones and nought-ness, is pristeen, exact, antiseptically clean. In
university I was taught Shannon’s (or Nyquist or Whittaker or Kotelnikov) theorem which
states that any waveform of maximum frequency f, can be measured without any loss of information in 1/(2f) samples. Therein lies the
problem: anything that purports to be an exact likeness surely cannot
communicate emotion. It’s like expecting to have a conversation with a
photograph. And yet music does.
OK, step away from emotions, I’m an engineer after all. For
this blog, I want to examine high resolution audio, and the meaning behind the
numbers. The audiophile is being told that CD quality isn’t good enough, that
due to the emaciated specs of the CD Red Book standard, we are missing out on detail. The key metrics
are bit depth and sample rate. CD specification is for 16 bit and 44,100
samples per second (abbreviated to 16-bit/44.1 kHz or just 16/44.1). Audiophile
is now 16/96, 24/96 or even 24/192. Obviously, bigger numbers are better?
Except, they’re not, and could even be worse than CD. Ooh, controversy..
(I should pause now for a disclaimer: I’m an engineer by
training, therefore educated as an empiricist rather than a rationalist.
However, I’m a consultant by career, which is characterised by the ability to
make decisions rationally rather than empirically. I therefore fall between the
two camps of: It sounds better therefore it is
better, vs It isn’t measurably better therefore it can’t be better. But my wallet is ruled empirically, so I’ll only
spend money if I can be sure it is
better.)
Let’s start with examining sample rate. Shannon’s theorem, mentioned earlier, is a very useful
starting point. By common agreement, the limit of human hearing is taken to be
20 kHz. (In fact, if you can hear 20 kHz, then you’re probably very young or very
exceptional – I can only hear up to 14 kHz ish thanks to age and Iron Maiden,
and you’re probably no different. Except for the Iron Maiden part.) So Shannon tells us that the full range of audible
frequencies can be reproduced without any loss of information using a sampling
rate of 40 kHz. CD, with a sampling rate of 44.1 kHz, should be of completely
satisfactory quality.
(You may be wondering why an extra 4.1 kHz of samples was
added in the standard. Like many technical decisions, this appears to be
arbitrary; Sony wanted it, and Sony prevailed, presumably because they’d
figured out how to make an anti-aliasing filter with 2.05 kHz transition band
more cheaply than anyone else.)
So why are audiophiles moving to higher sample rates, as
much as 192 kHz, in order to reproduce a frequency range four times higher than
the limit of human hearing? An argument is set out by Ian
Shepherd of Production Advice:
But a fair few musical instruments produce sound well above these
frequencies [20 kHz] – muted trumpet and percussion instruments like cymbals or
chime bars are clear examples. This leads to two potential objections to a 44.1
kHz sample rate – first, that in order to reproduce a sound accurately we
should capture as much of it as possible, including frequencies we probably
can’t hear. There are various suggestions that we may be able to somehow
perceive these sounds, even if we can’t actually hear them. And secondly that
depending on the design, the anti-aliasing filter may have an effect at
frequencies well below the 20 kHz cut-off point.
A reasonable statement, that just happens to be wrong.
There’s a great article by Monty on Xiph.org
that explains why high sample rates can actually degrade music reproduction:
Neither audio transducers nor power amplifiers are free of distortion,
and distortion tends to increase rapidly at the lowest and highest frequencies.
If the same transducer reproduces ultrasonics along with audible content, any
nonlinearity will shift some of the ultrasonic content down into the audible
range as an uncontrolled spray of intermodulation distortion products covering
the entire audible spectrum. Nonlinearity in a power amplifier will produce the
same effect. The effect is very slight, but listening tests have confirmed that
both effects can be audible.
In other words, the act of trying to reproduce high
frequency range will cause audio imperfections introduced by equipment not
designed to reproduce them. So either buy more expensive equipment that can
reproduce up to these ultra high frequencies perfectly, and then ignore them
because the human ear cannot hear them. Or just don’t bother, and stick to the
good enough CD quality.
Bit depth is the
second critical number for audiophiles, which inflates in high-resolution
recordings. CDs are at 16-bit depth, which gives a nominal signal to noise
ratio of 96 dB (RMS, i.e. the noise energy spread across the whole frequency
range). To put into context, the dynamic range of CD is enough to go from the
loudness of a light bulb to that of a pneumatic drill.
But psychoacoustic tests tell us that humans can discern
sounds that are 18 dB quieter than a light bulb (in the absence of ambient
noise) at around 3.8 kHz. To reproduce the total range of sub-bulb threshold of
hearing to threshold of pain loudness we need around 120 dB, so we need more
bits, right? As Bob
Stuart of Meridian Audio points out, 20 bits should be the target bit depth
to give this dynamic range.
Well, no. Firstly, it’s dumb to play music at the threshold
of pain. Secondly, the reason you can’t hear light bulbs is because the level of
environmental noise, the noise floor, is more than 28 dB louder than the limit
of human hearing. So the practical limit-to-limit dynamic range is more like 90
dB.
Thirdly, the dynamic range of 96 dB isn’t quite correct,
since the human ear doesn’t hear in terms of RMS sound pressure. The human ear
consists of a large number of pressure sensitive hairs, tuned to small
frequency bands. Because the frequency bands are so small, the hairs react to a
fraction of the noise floor energy. In fact, 96 dB RMS turns out to be around
120 dB in the frequency band of each hair. That’s the difference again between sub-bulb
and pain of the previous paragraph (funny that). So 16 bits gives us 120 dB of
perceived dynamic range, which is more than enough.
(What about quantisation error, you cry? To clever-old-you,
I say nonsubractive
dither, and leave you in the care of Google Scholar. If you would like a
dumbed down version: Quantisation errors arise because the digital sample isn’t
precisely the same value as the analogue source. However, using the digital
technique of dithering, the quantisation error is made incoherent, i.e. it
doesn’t impact the signal, instead is moved out of hearing frequencies.
Therefore the impact of quantisation error is greatly reduced.)
So if the psychoacoustics do not support the supremacy of
24/96 then why do audiophiles claim it sounds better? I think part of the
reason is that humans are wired to be rational rather than empirical, to be
biased by their opinion rather than to be an empty vessel, and therefore it’s
inevitable that belief comes into it – I believe it sounds better therefore it is better. And also sometimes,
re-mastering for 24/96 goes back to the studio recordings and creates an
improved source sound, and therefore quality improves – which would be true regardless
of bit rate.
But I also think that, rather like the megapixel war with
digital cameras, bigger numbers should be better, and coincidentally dealing with these
bigger numbers (more bits at a higher rate) justifies spending money on
expensive kit to do the job properly. And audiophiles like expensive kit.
There’s also a less technical reason, a historical hangover,
to why we might have these numbers. Thirty years ago, digital technology wasn’t
as good as it is today. For example, filters were much better quality in the
analogue domain than digital. So music was recorded at high sampling rates, 96
kHz often used, to provide an ample transition band. This technique is called
oversampling. Oversampling is still an important technique used in good 16-bit
CD players.
On a similar theme, 24-bit depth was often used in recording
to provide a much lower than needed digital noise floor. Once digital effects
on top of digital effects were added, the noise floor inevitably increased, but
because of the bit depth used, still didn’t impact upon the signal quality. The
bit depth was reduced to 16-bit at mastering. Nowadays, 32-bit recording is becoming
popular instead of 24-bit, because 32 bits are more efficient with physical
computer memory and CPU (most computer architectures are 64-bit nowadays, which
is a nice multiple of 32 bits).
It still doesn’t follow that there’s actually 24 (or 32) bits
or 96 (or 192) kHz worth of audible information, even if the studio recordings
were in 24/96. Empirically, we aren’t able to discern the difference, so it’s
better to focus on acquiring lossless files and the best mastered recordings.
The fact that US
federal archives requires 24 bit depth because “16-bit audio, the CD standard, may be inadequate for many types of material”
is just another high profile conflation of what is technically feasible with what
makes an actual difference. This is the essence of audiophile snake oil.
Oh, and rationally, it’s better to just enjoy the music.
I’m glad you’ve stuck with me so far. There are yet more
complexities, and more acronyms, to unravel in this audiophile journey. Coming
in a future post…
More in this series: part
1, part 3.
No comments:
Post a Comment
It's always great to hear what you think. Please leave a comment, and start a conversation!