The Essence of Sound
Digital audio is how computers represent sound but there is a lot beneath the surface. The better we understand digital audio in depth the more the technology can work for us instead of the other way around. To really understand what digital audio is we need to start at the beginning of how sound is made. All around us are air molecules. Sound happens when those molecules are disturbed in some way. The actual molecules don’t travel along with the wave after the initial attack of a sound. Instead, the wave is made of moving patterns of the molecules colliding with each other in the air. Sound can travel not just in air but in any medium – solid, liquid or gas. All of these mediums are considered part of the acoustic domain, where acoustical energy (sound) travel. A hand clap makes a single quick sound but let us see what happens if we sustain a sound. The physics of a string make it vibrate back and forth at a constant rate or frequency which disturbs the air molecules in a regular pattern and it creates the sustained sound that we hear as a musical pitch. The individual air molecules are not travelling along with the wave, they are simply colliding with each other in a pattern that keeps the wave moving. Ears and microphones pick up sound by noticing the patterns of movement of the small group of air molecules right near by. Keeping track of and recreating these patterns is the basis of all sound recording. To record and playback sound digitally we need to answer the question how can we measure and describe those patterns using numbers.
Building Blocks of sound
Two of the most important aspects of sound are FREQUENCY and AMPLITUDE. They can be measured by numbers.
-
- Frequency – The simplest possible sound that has only one frequency at a time is called a Sine wave. We perceive that frequency as a pitch. Frequency is the rate at which a sound wave fluctuates. This is different than the speed of sound. The analogy to this is the waves of the sea hitting the shore. We could measure the speed of the waves (how fast they move toward the beach) or we could measure the frequency of the waves, which is how often a wave arrives. If a wave heats the beach every second, the frequency is one wave per second. We measure the pattern of high and low air pressure fluctuation in cycles per second ( Hertz – Hz). If we graph the air pressure at a certain spot over a period of time, we can get an idea of the shape of the sound. Higher frequency sounds change pressure up and down more often and lower frequencies, less often. ( Example of different frequencies and an oscilloscope). 100 Hz, 1000 Hz, 5000 Hz, ( find a digital oscilloscope – it can be used to find clipping, phase issues). Every musical note has a fundamental frequency. Different music instruments and voices have a different range of frequencies that they can produce. A complex real world sound has a fundamental frequency that determines the pitch and other harmonic frequencies that determine the Timbre – what makes one instrument to voice sound different from another. No matter what the frequency of the wave is, the sound itself moves at the same speed. This is called the sound of speed. The speed of sound is depends on what is it actually moving through. In air in average temperature and humidity it is just above 340 m per second. In other materials it can be even faster. Sound travels 4 times faster in water and 15 times faster in solid iron.
-
- Amplitude measures how far the sound fluctuates. If it’s a small fluctuation it is a small sound and vice versa. Low amplitudes = quiet sounds and high amplitudes = louder sounds. Another way of looking at amplitude is how big a difference in air pressure a sound makes. A barometer measures air pressure. This means it measures how crowded the air molecules are within a given space. It is measured in pascals. Air pressure depends on the physical elevation – above ore below sea level, the weather and other factors. The specific number does not really matter for sound. What matters is how much it changes. If the air pressure near your ear stays constant of what the barometer measure, the you will hear silence. If the air pressure fluctuates above and below the average by one 25 thousand of a pascal, then you might just barely hear a sound. A jet engine fluctuates the sound pressure about 200 Pa. The loud sound makes 5 000 000 times difference in fluctuation compared to the quieter sound. To have a better representation of how our ears hear this and to make it more manageable we use a logarithmic scale to define it. With a logarithmic scale, as you multiply the changes in air pressure you only add numbers to the loudness scale. To measure loudness we use the dB – Decibel unit. The decibel is not a unit of measuring at all. It is a ratio of comparison. It is logarithmic and it serves perfectly the purpose of covering a huge range of amplitudes that we can hear. We ca use the dB as if it were a unit of measurement by defining a reference point. For acoustic sound pressure we mark this with SPL – sound pressure level. If we set 0 dB to be the quietest volume most people can hear (the threshold of hearing), then we keep adding decibels, because the logarithmic scale turns the addition of dB into multiplication of pascals, we arrive at the painfully loud 200 Pa at a much more manageable number. Using dB has the advantage of more useful numbers across the board. Most of the time, for recording and mixing, boosting an audio signal by 6dB doubles the amplitude of that signal and reducing it by 6 dB half the amplitude of the signal. There are areas of audio that this could be measured differently, but for recording and mixing it is usually 6 dB. There is a pattern whenever you see a representation of sound waves.
Frequency and amplitude are the most basic measurable aspects of a sound. Frequency is how often the sound fluctuates measured in Hz and amplitude is how intensely the sound wave fluctuates, measured using decibels relative to a reference point like SPL.
How soundwaves interrelate - time and phase
The higher the frequency of the wave, the more times it cycles per second and therefore, the shorter each cycle is. Wavelength is the measure of the length of one cycle.
Sine wave – wavelength observation from the initial rise to its peak, the fall into the troph? and back to the original starting point. On the oscilloscope the line in the middle (the X axis) shows time, form left to right and the Y axis shows the air pressure. A sine wave oscillates between positive and negative amplitude in a smooth curve as time passes. This oscillation is measured in degrees. From 0 to 90, to 180, to 270, up to 360 (called 0 again) and then the cycle repeats, just like the degrees of a circle. This shape also shows the phase of the wave over time. Here phase refers where within the cycle the wave currently is.
Relative phase of two or more sounds: Combining two sound waves creates one wave with a new shape. If we combine two copies of identical waves (sound), the way the copies combine depends on their timing compared to each other i.e their relative phase. If the + and – are in sync, those waves are perfectly in sync with each other and we get a combined sound twice as loud. If the + and – are timed so they happen opposite to each other then they are out of phase. The result in mixing two identical sounds, perfectly out of phase, is silence. If the relative phase is somewhere in between, the waves will partly reinforce or cancel. If we combine waves of different frequencies it creates a more complex shape. If we mix a sine wave at 250 Hz with another of 1600 Hz we get a new shape with elements of both. Then, if we combine two copies of our new waveform, we see that changing the timing affects the phase of the two frequency components differently. This is called comb filtering – mixing a complex sound with a (time) delayed copy of itself causes some frequency components to cancel and others to reinforce.
If one of the copies of the waveform is flipped upside-down, the result is also total cancellation. A lot of people call this out of phase, but this is not technically correct. Phase is a timing difference, not a flip. Flipping the sound (when + becomes -, and vice versa) is sometimes called inverting the phase, but the technically correct terms is inverting the polarity. This is a very important distinction, when it comes to complex waves. Comb filtering adds a tonal alteration as well.
The way we perceive sound
It is a very complex process and scientific studies are still discovering new information on this matter. We can divide the process into 3 parts.
-
- Physical – when sound in the air reaches our ears. Fluctuations in air pressure (acoustic sound) are reflected and shaped by the PINNA and directed through the EAR CANAL. The ear drum resonates with the air pressure fluctuations setting into motion the OSSICLES. The ossicles mechanically amplify those movements causing fluid motion in the COCHLEA. Inside the cochlea are thousands and thousands tiny hair cells each of which is tuned to a very narrow frequency range through its size and stiffness. These tiny hairs can feel movement, but instead being processed as a feeling by the brain (like the wind), the movement of these hairs is processed as sound by the AUDITORY NERVE. The average persons hearing averages from extremely low as 20 Hz to extremely high as 20 000 Hz. As people age we tend to lose the some of our high frequency hearing. This is because in order of a hair cell in the cochlea to resonate with a very high frequency, it has to be very small and delicate. The hairs move back and forth when resonating with sound. The louder the sound, the more they move. With long term exposure to loud sound, these hairs tend to weaken and die over time. Thats why it is harder to loose low frequency hearing, because the hair cells for those sounds are less delicate. After the sound is physically heard, the nerve impulses travel to the brain for processing. The study of this mental part that we hear is called Psychoacoustics. Some of it is
- Subconscious, happening outside of our hearing and some of it is
- Conscious. The brain subconsciously interprets the nerve impulses coming from the ears in many ways. For example, subconscious processing is what lets you hear the words that you listen to of a man speaking and understanding their meaning without constantly having to figure out each sound that makes up each word. There are also many ways our brain processes sound without our awareness. One example might be when someone changes a setting on a piece of equipment and we really seem to hear a different only to find out that the equipment was not hooked up. The McGurk Effect (Example), where the sense of sight influences our perception(subconscious) of sound even before the sound arrive. There is not always a clear line where subconscious ends and conscious begins. It is more of a gradual spectrum between the two. Human memory of sound is not literal. Instead of storing the raw unprocessed sound, our memory seems to store what we felt about a sound instead. So when we try to recall something we heard, say to compare it to something we hear now, often times what we are actually recalling is not sound itself, but thoughts and feelings about sound.
To summarise, science has not explained the whole process yet, your experience for all of the above can be different.
Where does audio live?
- Acoustic
- Analog
- Digital – it involves all 3
As seen before, sound starts as vibrating waves in a medium such as air. This is where all sound comes from, unless it is electronically generated, and it is where all sound has to go to be physically heard. The acoustic domain does not require any kind of electronic technology. It has the disadvantage that it is not possible to record anything. Sounds are heard once and then gone forever. The tools that make sound available in the acoustic domain are acoustic musical instruments. The sounds they make depend on their physical properties. To get sound from the acoustic domain to the analogue domain the sound wave needs to be converted into an electrical current using a device such as a microphone. The electrical current traces the back and forth motion of the acoustic sound wave so it is literally an analogy of the sound wave, hence the name Analogue domain. The analogue domain has several advantages over the acoustic domain. The biggest one is that in the analogue domain you can record and edit sounds using a physical medium such as a vinyl record or a magnetic medium like tape. In the analogue domain it also becomes possible to manipulate sounds with amplification, editing and FX and it becomes possible to synthesise electronic sounds without the physicality of small instruments to make high sounds or large instruments to make low sounds. Sounds can become distorted in the analogue domain in ways that do not naturally occur in the acoustic domain for better or for worse. This could be accidental and sound bad when a microphone is overloaded and the words become unintelligible or purposeful and sound good like playing a rocking electric guitar solo. With analogue recording manipulation of time becomes possible but it is limited to things that can be dome physically like editing analogue tape with a razor blade or flipping the tape around to play it backwards. Analogue recordings can also be sped up and down but pitch, speed and tone are unavoidably locked together in the analogue domain. Even though we have increased power and flexibility in the analogue domain, we are still limited to things we can physically do with tape, vinyl and electricity. With digital audio we can do even more.
To get a sound in the digital domain we take an analogue signal and convert it into numbers that stand for the sound wave. This process is called sampling and it works by taking thousands of measurements of the analogue signal every second and storing that stream of numbers in a computer hardware drive. In the digital domain we can use software to manipulate those numbers and therefore the sound the numbers stand for, in ways that were not possible in the acoustic or analogue domain. Sounds can become distorted in the digital domain in a way that does not naturally occur in the acoustic or analog domain. Like analog distortion, digital distortion can be accidental or purposeful. With digital recording it becomes possible to alter the pitch, speed and tone of a recording independently and to synthesise and manipulate in other ways that are not possible in the analog domain. The incredible power and flexibility of digital audio is why it has become the dominant way to record sound. Software is constantly evolving to make this power more accessible to beginners and nowadays it is extremely rare to hear recorded audio that has not made a journey through the digital domain. The way to gain mastery over the power of the digital audio and deal with it’s occasional drawbacks is to better understand it.