including the digital realm

Now that we now how the sound from the acoustic domain can be transduced to and from the analog domain we will explore where does  digital audio fit in. It is in the analog signal path. Instead of having an analog recording device (i.e tape recorder) we can swap it for one of three:

  1. ADC – Analog to Digital Converter
  2. A computer with digital storage
  3. DAC – Digital to Analog Converter

The ADC measures the signal thousand of times per second (later in sampling) and sends each measurement to the computer as a number so it can be saved on to a hard drive (or other digital storage). This is the sampling process. To play back the digital audio the computer reads the sample measurement numbers and sends them to a DAC that reconstructs the analog signal using the measurements. Audio equipment often combines several stages of the process into one piece of gear. All three domains of audio are interdependent. Even if we create a sound using an analog synthesiser, the same principles apply. This is why it is important to understand both acoustic and analog audio even if you only use digital audio.

analog to digital conversion

To bring audio from the analog domain to the digital domain we use a process called sampling. We take regular snapshots (measurements) of the voltage of the analog signal and record them as numbers. The timing of how often a sample is taken is called a sampling rate or sampling frequency. It is measured in Hz. The sampling frequency determines how detailed representation of the analog signal will be captured in the digital domain. Since higher frequency sounds create more detail in the shape of a sound wave we need to use a higher sampling rate to capture those sounds. Some people make an analogy of the sampling rate with the frame rate when we record video. This analogy is not quite accurate but it is a helpful simplification. Nyquist frequency.

  • The first thing that an ADC does to analog audio before sampling is to filter out all frequencies above the Nyquist limit for the current sampling rate. If they are not filtered out, any frequencies higher than Nyquist will be flipped, or reflected back, into the audible range during sampling. This is called Aliasing. This is one of the ways that digital audio can distort, that analog audio can not. The filter that removes the frequencies above Nyquist is called Anti-Aliasing Filter. If it does its job correctly, no aliasing will occur and all frequencies below Nyquist will be recorded accurately. Nowadays nearly all ADC have a build in anti-aliasing filter.

As the ADC measures each sample it has to assign that sample a number, that is what makes it digital as opposed to analog. Computers do not deal with vague measurements, as “a little more than 6”. The ADC has to assign a specific number to the measurement. This is called quantizing. This means that often the computer must roll off the value to the nearest available number. Computers use binary numbers. A digital audio sample measurement is recorded as a strain of binary numbers. Just as regular numbers can be made more precise by adding digits after the decimal point, binary numbers can be made more precise by using more bits – binary digits.

We can use a ruler as an example – if we use two bits to divide on inch on a ruler, we could choose from 4 possibilities. Any of the 1/4 in marks. With 3 bits, we divide it into twice as many parts and we can choose any of the 1/8th inc marks. 4 bits give us 16 options, 5 bits gives us 32 and so on doubling each time. The number of bits used to record a sample measurement is called bit depth. A greater bit depth divides the range of possible amplitudes into smaller pieces and lets us take more precise measurements of each sample. The most common bit depths are 16 bit and 24 bit. 16 – bit audio is the standard for CDs and it divides the amplitude range into 65,536 possible quantization levels. 24 bit is the standard for recording and various high definition formats and it divides the amplitude range into 16,777 216 possible quantization levels. The precision in the number of quantisation is determined by the bit depth.

gain staging in the digital realm

In the analog domain a gain stage works using an electrical circuit. In the digital domain a gain stage works using math, more specifically multiplication and division. If we want a software to turn up a sound by 6 dB, the computer goes through each of the thousand and thousand of samples and multiplies its value by 2. If we turn the sound down by – 6 dB the computer divides the sound in 2.

The specific amounts that the computer uses to multiply or divide are different for different amounts of volume change but the principle is the same. This is one of the reasons older computers from the 90s could not work that well with digital audio because they are not able to do all that math quickly enough. In the analog domain if we try to turn up a sound too loud we run out of available voltage which distorts the sound. In the digital domain we are not running out of voltage, instead we are running out of available number range.

As long as we turn down a waveform to not exceed 0dBFS before it goes to the DAC, we will not have a problem playing back the file without any distortion. If we try to play back anything louder the digital full scale it will be clipped as it goes through digital to analog conversion. Both of the digital and analog domains distort the sound in a similar way, more or less, if we amplify them too much. If we turn a signal down too much, the problems can be quite different between the two domains. In the analog domain the signal to noise ratio drops so that the noise overpowers the signal. In the digital domain there can be a very different problem called quantization distortion. This was once a big disadvantage in digital audio especially back in the days when bit depths lower than 16 – bits were common. Modern hardware and software combine greater bit depths with a technique called dither. This lets the digital domain handle lower amplitude sounds at least as well as, if not better, than the analog domain.

what is dither?

If it was not for dither, digital audio at very low amplitudes would run into Quantization distortion. This happens to low amplitude signals which are too small compared to the space between available quantisation levels. If we measure a pencil with a ruler, in which the pencil represents a high amplitude, it does not matter too much if the pencil’s measurement is quantized to 6 inches or 6 and an 1/8 inches. The quantization distortion, that is the difference between the original signal and the measured value, is relatively very small. At low amplitudes this can be a problem. If we try to measure a paperclip, which can represent a fairly low amplitude signal, the difference between calling it 3,1/16 inches or 1/4 inch is relatively large and since the paperclip is actually in between those two, choosing either value would introduce a fairly large rounding error relative to the object’s size. Things get even worse when the amplitude drops even further. If we ant to measure the width of a guitar pick the ruler cannot even register it because it is so thin that the measurement is quantized to 0. For an example like this in relation to quatization distortion in  digital audio we need to massively exaggerate because it happens at incredible quiet amplitudes. Quantisation distortion is mostly not a problem with 24 bit audio. Still a lot of audio is being distributed at 16 bit so noticeable qunatization distortion can be noticeable at minus 60 dB FS. The solution to this is dither. It is a technique used by most digital audio hardware and software that eliminates quantization distortion in exchange of adding a little bit of noise. Similar to tape hiss but much quieter. There is an added EQ to the noise to turn down the frequencies that are easiest for us to hear and turn up the frequencies that are hardest to hear. This is called noise shaping.

Digital to analog conversion

To hear digital audio in the acoustic realm, we need to run it through a DAC. The way a DAC works is that it goes through each sample and produce an analog voltage that matches that sample’s measurement at the appropriate time based on the sample rate. Those voltages pass through a reconstruction filter so that the final output is a smooth analog waveform.

The reason how a digitally sampled waveform can sound so smooth (instead of the jagged waveform that we see in the software representation) is the Shannon and Nyquist Sampling Theorem. This is the mathematical principle all sampled audio is based on. Most audio software will show two sample points either as a straight line connecting two dots or a stair step. I reality the actual sampled audio is more analogous to a curved string connecting two dots. This happens naturally due to the laws of physics. The two points in the sample is like a separate voltage the DAC produces per sample and the curved string is the reconstruction filter which turns those voltages into a continuous waveform. In digital audio, the factors that define the curvature of the line between the two sample points are the sampling rate and wave physics in the reconstruction filter. In audio software the representation of the waveform may be shown having jagged and blocky lines connecting the sample points but in reality if we check the sound wave with an analog oscilloscope the wave is smooth. In software, this is done to save processing power required to digitally represent each curvature of the distance between two sample points.

data compression and digital extensions

Digital audio can be saved in a variety of audio formats and these formats can be sorted into 3 groups. Uncompressed, losslessly compressed  and lossy compressed. We are talking about data compression here, not dynamic range compression. The most common uncompressed audio formats are:

  • wav
  • aiff

Here the audio data is a list of just raw numbers that represent samples. Since numbers can be organised in different ways, both of the formats include a header with instructions on how to interpret the audio data (sample rate, bit depth, which samples belong to L or R channel).

losslessly compressed – Lossless means that the contain the exact same amount of samples as the original and compressed means that they take less computer disk space. Lossless compression works like a *.zip file where the data is abbreviated in ways that don’t change it. A losslessly compressed file might use a special code for 500 zeros in a row instead of actually storing one of those zeros. As a result, audio stored in this format, usually takes up to 20 – 50 percent less computer space depending on its complexity. Because this format has the exact same list of sample numbers with no changes, those files sound identical to the uncompressed audio.

FLAC – is probably the most common losslessly compression format. It stands for : Free lossless audio codec. Codec is short for COmpressor/DECompressor. ALAC is used in iTunes.

Lossy compressed formats : It is much more complex but for the exchange of the file size, some audio information is lost so the sound is altered from the original.

  • mp3
  • aac
  • wma
  • Ogg vorbis

These formats use psychoacoustics to determine how likely the human ear is to hear a given aspect of a sound in a given instant. Lossy codecs use this psychoacoustic model to prioritise what information to keep and what to throw out. If a very quiet sound at one frequency happens at the exact same time as a loud sound at a nearby frequency our ears will not pick up the quieter frequency at all. Lossy compression will then leave out the quieter frequency to save storage space. When compressing you can set how big you want the resulting file to be by specifying a bitrate. The bitrate sets how many bits per second the codec is allowed to use to store the audio. Usually the bitrate is measured in kilobits(thousands of bits per second) and it is often a power of two or a multiple of a power of two (128, 192, 256, 329). Older versions of codecs had poor representations of how human hear and they would make mistakes regarding what information to throw out or what to keep. Combined with the power bitrate that people used to save space because it was dial up modems and they wanted to save transmission time as well this gave mp3 a very bad reputation. With modern mp3 encoders for most music with the average person on reasonable equipment about 192 kbps or higher are usually transparent most of the time. The way to scientifically test your ears to weather you can tell the difference between a given bitrate of an mp3 and the original uncompressed audio is to do a ABX test – (hydrogenaudio.org). Converting mp3 files back to an uncompressed audio file does not restore the lost information.

The way digital connections function

With analog connections and all of its different types of plugs and signals, the physical connections does not necessarily have anything to do with the kind of signal that passes along the wire. With digital cables and connections it is a similar situation but with a couple more layers of complexity.

  1. The Physical Connection – the shape of the plug and the type of wire
  2. Electrical/Communication Protocol – How the data is encoded and timed as electrical voltages and how the flow of that data is controlled.
  3. Audio Format – The format of the audio itself. That includes factors like – sample rate, bit depth or wether the file is mono, stereo or multichannel.

There are two categories of digital connections.

  1. General Computer Connections – Used for audio but also for other computer peripherals.

The most common digital connection in the average home studio is USB. USB can carry digital information for all kinds of purposes including audio interfaces. USB can both send and receive data so a computer can make use of both ADC and DAC in an audio interface and the same time. The signal path is this:

Mic/analog signal = interface analog input = pre-amp to line leve (gain) = A/D converter (sampling) = the interface sends the samples using the USB protocol over = the USB cable = the computer = the operating system receives the USB messages and = sends the samples along to the audio software. At the same time there can be other communications over the same USB cable. For example, the same computer might send audio to the interface to play out its DAC or the computer might send the interface instructions like triggering sample rate, bit depth etc. Other digital connections like thunderbolt or firewire work almost the same way as USB though they use different techniques and protocols.

2. Digital Connections Without a Computer:

  • AES/EBU
  • S/PDIF

Both of them are very similar and are two channel connections usually stereo L and R.

Two channel digital audio can be send to AES cables (similar to XLR cables). Special RCA cables with different electrical properties from regular audio and video cables. Toslink Fiber Optic cables that conduct pulses of light instead of electricity. These finer optic cables can be used for a third type of digital audio connection called ADAT (light pipe) which can carry 8 channels of audio at once.

These types of non-computer digital connections often appear in home entertainment systems. They also in some occasions serve specialised purposes in larger studios and live sound installations.

Although all of these connections have different shapes and different functions they all serve to carry digital audio data which is a list of numbers. Because of that digital connections either work perfectly of fail hard with silence or loud pops and clicks. Brands of these products are chosen for their liability not for claims of subjective quality.