Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Digital Signal Processing for Audio Applications: Volume 1 - Formulae
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
Ebook537 pages4 hours

Digital Signal Processing for Audio Applications: Volume 1 - Formulae

Rating: 0 out of 5 stars

()

Read preview

About this ebook

In the summer of 2003 we began designing multi-track recording and mixing software – Orinj at RecordingBlogs.com – a software application that will take digitally recorded audio tracks and will mix them into a complete song with all the needed audio production effects.  Manipulating digital sound, as it turned out, was not easy.

LanguageEnglish
PublisherAnton Kamenov
Release dateAug 1, 2017
ISBN9780692912201
Digital Signal Processing for Audio Applications: Volume 1 - Formulae

Related to Digital Signal Processing for Audio Applications

Related ebooks

Technology & Engineering For You

View More

Related articles

Related categories

Reviews for Digital Signal Processing for Audio Applications

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Digital Signal Processing for Audio Applications - Anton R Kamenov

    Chapter 1. Introduction

    In practice, sound is a complex multitude of waves with various properties. When this sound is recorded digitally, however, it becomes simply a collection of numbers. The properties of the many waves remain hidden within the digitized data. In the world of audio then, the task of digital signal processing – the processing of digitized audio signals – is to manipulate these digital data with little knowledge of their precise properties.

    To be sure, contemporary music is likely digital. Since a digital audio signal is simply a set of numbers, there may be operations that are mathematical in nature and that allow us to modify these numbers to the needed end. Contemporary music mixing and mastering relies heavily on such mathematical manipulations, whether for equalizing, reverberating, introducing distortion, or otherwise producing the recorded sound.

    The purpose of this book is to present a simple, structured approach to understanding how digitally recorded sound can be manipulated. While the theory of digital signal processing may seem complex, the corresponding mathematical computations are not. Digital signal processing can be presented in a simple and transparent manner, one that it is easy for hobbyists to understand and implement, and one that requires little mathematical or engineering background. After all, when properly explained, much of the practical applications of this mathematics reduce to simple algebra and some trigonometric identities.

    Digital signal processing, or DSP, is the science of manipulating digital signals. It is not specific to music, but is used anywhere where digital signals are present, from cell phones and television transmissions, to digital images and digital blood pressure monitors. Whenever a signal is recorded digitally, DSP manipulates the signal using only the information provided in the digital signal data. It is the fundamental task of DSP to manipulate digital signals using nothing more than what is contained in the digital representation of the signal.

    DSP for audio is of specific interest for two reasons. First, manipulating digital audio data is relatively simple. Mathematically, these data are a one-dimensional array of information – a function of time. Second, audio production uses a wide variety of manipulations, including simple ones such as combining two audio signals during mixing, more complex ones such as changing the frequency content of audio data during equalizing, or esoteric ones such as purposefully introducing errors in the data during dithering. Audio data is the ideal candidate for an intuitive description of the full power of DSP.

    Chapter 2. Simple waves in continuous time

    To manipulate signals mathematically, we must first give them a useful mathematical representation. The purpose of this and the following several chapters is to show that complex practical signals, including sound, can be represented as collections of simple sine and cosine waves with various properties.

    We begin by examining the properties of simple sine and cosine waves. Consider the following function of time t.

    Equation 2.1

    Equation 2.1

    The cosine wave x(t) is a continuous function that oscillates up and down as the time t increases in the realm of real numbers as in figure 1.

    As simple as this function is, it has several important properties. It peaks at 1. Its value at its trough is -1. It completes one of its cycles in a period of time equal to 2π. It starts with cos(0) = 1 at t = 0, is positive for 0 < t < π/2, zero at t = π/2, negative for π/2 < t < 3π/2, and 1 again at t = 2π, since cos(2π) = 1.

    Figure 1. A simple cosine wave

    A simple cosine wave

    The function cos(t) starts with 1 at t = 0, has peak value of 1, and completes a cycle in a period of 2π.

    2.1. Initial phase

    Not all waves peak at t = 0. Suppose that instead of starting a cycle at t = 0, the wave begins with a peak at t = τ. Such a wave would be described by the formula

    Equation 2.2

    Equation 2.2

    When t = τ, we have x(τ) = cos(τ τ) = cos(0) = 1 and so this wave begins its cycle with a peak at t = τ.

    The quantity τ is the wave's initial phase. The waves cos(t) and cos(t τ) have the same peak values and cycles of the same length, but have different initial phase. The wave cos(t) has initial phase of 0.

    Figure 2 compares three waves – ones with τ = 0, τ = -1.2, and τ = 3. Larger values of τ shift the wave to the right and smaller (or larger negative) values of τ shift the wave to the left by the amount of τ.

    Figure 2. Simple waves with different initial phase

    Simple waves with different initial phase

    These three simple waves have the same cycle lengths and peak values, but have different initial phase: 0, -1.2, and 3.

    Note that a wave with initial phase of τ = 2π would be indistinguishable from the original wave cos(t). While the initial phase of those two waves is different – 0 and 2π – the change in the initial phase just happens to be the same as the length of the wave's cycle and the two waves coincide. Note also that a wave with the phase τ = π is the same as the wave cos(t), but inverted as in figure 3. This happens because the phase τ = π is exactly one-half of the length of the wave's cycle. These two waves are said to have an inverted phase or inverted polarity, although it would be proper to say simply that the initial phase of the two waves differ by half of their cycle.

    Figure 3. Two waves with inverted phase

    Two waves with inverted phase

    When the initial phase of two waves with the same cycle lengths differs by ½ of their cycle, these waves are said to have inverted phase.

    2.2. Peak amplitude

    If, in addition to changing the initial phase, we wanted to change the value of the wave at its peak, we could use the formula that follows.

    Equation 2.3

    Equation 2.3

    The quantity A is the value of the peak of the wave and is called the wave's peak amplitude. Larger values for A result in waves with larger peaks and troughs and smaller values for A result in smaller peaks and troughs. Figure 4 compares the waves cos(t), cos(t + 1.2), and 0.6 cos(t + 1.2).

    The waves cos(t) and cos(t + 1.2) have the same peak amplitude, but different initial phase. The waves cos(t + 1.2) and 0.6 cos(t + 1.2) have the same initial phase, but different peak amplitude. The peak amplitude of 0.6 cos(t + 1.2) is exactly 0.6.

    Figure 4. Waves with different phase and amplitude

    Waves with different phase and amplitude

    cos(t) and cos(t + 1.2) have the same peak amplitude, but different initial phase. cos(t + 1.2) and 0.6 cos(t + 1.2) have the same initial phase, but different peak amplitude.

    2.3. Frequency

    Since all waves discussed above complete a cycle in a period of time equal to 2π, these same waves will complete (1/2π) cycles in a period of time equal to 1. This means, that these waves have the frequency (1/2π). If we want a wave with a cycle of 1, then we should traverse time 2π times faster. If we want a wave with a cycle of ½, then we should traverse time 2π / 2 times faster. In general, if we want a cycle of 1/f, then we should traverse time 2π f times faster. The formula

    Equation 2.4

    Equation 2.4

    describes a wave that has initial phase of τ, peak amplitude of A, and a cycle of 1/f or, equivalently, a frequency of f. For example, when f = 3 cycles per unit of time the wave above will complete a cycle in a period of time equal to 1/3. At t = 1/3 the wave will become A cos(2π f τ) and will have the same value as at t = 0.

    Figure 5 shows three waves with different frequencies – cos(t), cos(0.5 t), and 0.4 cos(2π t). The frequency of cos(t) is (1/2π) cycles per unit of time. The wave cos(0.5 t) has a frequency of (1/4π) cycles per unit of time. The wave 0.4 cos(2π t) has a frequency of 1 cycle per unit of time. A larger f implies cycles of smaller length and a smaller f implies cycles of larger length.

    Figure 5. Waves with different amplitude and frequency

    Waves with different amplitude and frequency

    cos(t) and cos(0.5 t) have the same amplitude, but different frequencies. cos(t) and 0.4 cos(2π t) have different amplitude and different frequency.

    The wave

    Equation 2.5

    Equation 2.5

    has initial phase equal to τ units of time, peak amplitude equal to A, frequency equal to f cycles per unit of time, and length of cycle equal to 1/f units of time. The three properties – amplitude, phase, and frequency (or length of cycle) – fully define a simple cosine wave as a function of time.

    It would occasionally be beneficial to consider the initial phase of the wave not in terms of units of time, but as a portion of the length of the wave's cycle. We would then use the following formula

    Equation 2.6

    Equation 2.6

    This formula normalizes the initial phase with respect to the frequency. Since this wave has a cycle of 1/f units of time, when θ = 2τ, then the wave is offset by exactly 1/f, or one cycle, and that is independent of the frequency. For example, when θ = τ, the wave is offset by 1/(2 f), or one-half of its cycle, independent of the frequency.

    The wave

    Equation 2.7

    Equation 2.7

    has initial phase of θ / (2π f) units of time or θ / 2π portions of its cycle, peak amplitude equal to A, frequency equal to f cycles per unit of time, and length of the cycle equal to 1/f units of time.

    Frequencies above are measured in cycles per unit of time. Typically, time is measured in seconds. Frequencies then are measured in cycles per second or Hertz (Hz). 1 Hz is 1 cycle per second.

    Chapter 3. Simple waves in discrete time

    Simple waves, when working in discrete time, are not defined for the whole realm of real numbers, but only at specific points in time.

    3.1. Sampling

    The process of taking the values of a continuous function at specific points in time is called sampling. If we sample the wave cos(t) at t = 0, 0.1, 0.2, and so on, we would record the values cos(0) = 1, cos(0.1) = 0.995, cos(0.2) = 0.980, etc. These values are shown on figure 6 for a short period of time.

    Rather than counting time t, we can count samples, which we will denote with k, k = 0, 1, 2, and so on. If we sample at every 0.1 seconds, the value of cos(t) at sample k and time tk = 0.1 k would be cos(0.1 k).

    Figure 6. A sampled simple wave

    A sampled simple wave

    Sampling of the wave cos(t) once every tenth of a second.

    We note two important properties of this specific type of sampling. First, the sampling above is done at uniform intervals. That is, the time distance between any two adjacent samples is the same. In the example above, this distance is 0.1 seconds. Second, sampling is recorded with comparable amplitudes. In this example, we record the value of the function as it is at every sample and do not scale or otherwise modify the amplitude at each sample by some different scale. This specific type of sampling is called pulse code modulation or simply PCM.

    Suppose that we sample at periods of time equal to T seconds. The quantity T is called the sample time. We would then sample 1/T times during one second. The number of times that we sample during one second (or, in general, one unit of time) is called the sampling rate or the sampling frequency. We will denote the sampling frequency by fs, fs = 1/T. In the example above, the sampling frequency is 10 Hz, as we take ten samples in every second. Finally, the set of values that the amplitude can take is called the sampling resolution. The sampling resolution in the example above is the realm of real numbers.

    Fact: The sampling frequency in CD audio is 44,100 Hz or 44.1 KHz. This means that music in CD audio is sampled and recorded digitally with 44,100 samples per second and the sample time is 1/44,100 seconds. Sampled values are recorded using 16-bit signed integers. Since 1 bit is used to represent the sign of the value (positive or negative), the remaining 15 bits are used to represent the amplitude. The amplitude can then take a finite number of values, namely 2¹⁵ = 32768 positive and as many negative values. CD audio is thus said to have 16-bit sampling resolution. The term bit resolution is also used to mean sampling resolution. Thus, CD audio is said to have a sampling rate of 44.1 KHz and a bit resolution of 16.

    3.2. Discrete simple waves

    Suppose that we sample the simple wave A cos(2π f (t τ)) with the sampling frequency fs. The sample time is T = 1/fs. The times at which a sample is taken are t = 0, 1/fs, 2/fs, 3/fs, and so on. If these samples are numbered as above with k = 0, 1, 2, …, then the value of the wave at each sample would be

    Equation 3.1

    Equation 3.1

    In a period of time equal to τ seconds, there would be m τ / T = τ fs samples taken and so we can, with some approximation, rewrite the above formula as follows.

    Given the sampling frequency fs, the wave

    Equation 3.2

    Equation 3.2

    has initial phase of m samples or m / fs units of time, peak amplitude of A, and frequency of f. Its cycle is 1/f units of time or fs / f samples.

    The quantity f / fs is known as the normalized frequency of the sampled wave. If, for example, the sampling frequency is 2000 Hz (samples per second) and the frequency of the wave is 500 Hz (cycles per second), then the normalized frequency is ¼. We can say that the frequency of the wave is 1/4th of the sampling frequency. Most data manipulation in DSP uses not the actual frequency of the wave, but the normalized frequency of the wave. This said, for clarity and to the extent possible, we will continue to write f / fs and will not rely on the normalized frequency for computations.

    3.3. The Nyquist-Shannon sampling theorem

    The Nyquist-Shannon sampling theorem states that the frequency content of a signal is fully represented by sampling at a certain frequency if the signal does not contain frequencies higher than one-half of the sampling rate.

    The Nyquist-Shannon sampling theorem, sometimes described as the Shannon-Kotelnikov theorem or the Nyquist-Kotelnikov theorem, is fundamental to any digital sampling of a signal.

    The theorem has many variations and the names Nyquist, Shannon, and Kotelnikov are used in various combinations. The following, for example, is the Nyquist-Kotelnikov theorem: "Every signal, which can be integrated in time and has a finite frequency spectrum, can be sampled at intervals of time that are smaller or equal to 1 / (2 fs), where fs is the maximum of the frequency spectrum."

    All versions of the theorem say one and the same thing, namely that if we are to properly sample some frequency, we must sample at a frequency which is at least twice that frequency. To represent the frequency 1000 Hz properly, we must use at least 2000 samples per second. If we sample with 2000 samples per second, we can catch only frequencies up to 1000 Hz.

    Fact: It is generally accepted that the human ear can perceive frequencies between 20 Hz and 22 KHz. Thus, a frequency of at least 2 * 22 KHz = 44 KHz is needed to properly represent the frequency spectrum that is humanly audible. This is the reason CD audio uses 44.1 KHz to sample audio data.

    A good example of how the Nyquist-Shannon sampling theorem works is to consider sampling at the sampling frequency fs of a simple wave with zero phase and frequency fs. At every sample k, the value of this simple wave would be

    Equation 3.3

    Equation 3.3

    This simple wave cannot be sampled properly with the given sampling frequency. Consider also the example in figure 7. The two simple waves with frequencies 1500 Hz and 500 Hz and with different phases are sampled with the frequency fs = 2000 Hz. These two simple waves produce the same value at every sample. Even if these samples do represent the frequency 1500 Hz, any equipment or software would interpret them as the frequency 500 Hz. Such confusion of higher frequencies with lower frequencies due to low sampling rates is called aliasing.

    Figure 7. An example of aliasing

    An example of aliasing

    Even though this is the sampling of the wave at 1500 Hz (dashed) with the sampling frequency of 2000 Hz, the sampled values will be interpreted as the wave at 500 Hz (solid). The confusion of high frequencies for low ones due to low sampling rates is called aliasing.

    Since at any frequency f we have

    Equation 3.4

    Equation 3.4

    aliasing occurs when the frequency fs f is confused for the frequency f or vice versa.

    Chapter 4. Complex signals and simple DSP operations

    Practical signals likely consist not of one, but of many simple waves. In fact, a signal that consists of a single simple wave or even resembles a single simple wave is probably artificially created. Figure 8 shows the amplitude of the many simple waves that are contained within one second recording of an acoustic guitar. Note just how many frequencies are contained in this actual example signal.

    Figure 8. Frequencies in an acoustic guitar recording

    Frequencies in an accoustic guitar recording

    This graph shows the amplitude of frequencies contained in a one second recording of an acoustic guitar solo (some information is lost when editing into black and white). The vertical axis shows magnitudes of frequencies with respect to the maximum magnitude allowed in the recording – from 0 dB (top) to -20 dB (bottom). The horizontal axis spans the frequency spectrum between 1 Hz and 1000 Hz. The picture was produced with Orinj – a multitrack recording software (www.recordingblogs.com).

    When a complex signal consists of many simple waves, it is difficult, if not impossible, to decipher what waves are present in the signal by simply looking at a plot of it over time. Even when the signal consists of three or four simple waves, which is still rare, we still may not be able to know the precise parameters of such waves.

    Figure 9. A sampled simple wave

    A sampled simple wave

    This complex signal consists of three different frequencies. We can tell that the signal contains at least a higher frequency with smaller amplitude and some lower frequencies. It is impossible to tell, however, that there are exactly three frequencies and that they are exactly the simple waves cos(t), cos(0.5 t), and 0.4 cos(2π t).

    Fact: Sound is usually very complex. Even when an instrument plays the frequency of a single note – the fundamental frequency – that frequency will be surrounded with multiple other frequencies, usually at lower amplitude, with frequencies higher than the fundamental frequency (overtones or partial waves) or lower than the fundamental frequency (undertones or also overtones). Some of those fit nicely with the fundamental frequency, as they are its integer multiples (harmonics). Others do not (inharmonic overtones or partial harmonics). Some instruments produce the same frequency in addition to the main frequency, independently of what note is played (formants). Flutes, for example, have formats around 8 KHz. These additional frequencies define the timbre of a sound. When instruments have different timbre, they sound differently, even when they play the same note.

    4.1. Mathematical representation of the complex signal

    In theory, we represent a complex signal as the combination of multiple simple waves. A continuous complex signal x(t) can be written as follows.

    Equation 4.1

    Equation 4.1

    An, fn, and τn are the peak amplitude, frequency, and initial phase of each of the n simple waves in the signal. Representing complex signals as collections of simple waves is an assumption – a heavy one. Whether complex signals in fact resemble collections of simple waves is discussed in chapter 5.

    A complex signal x(t), sampled with the sampling frequency fs, would be

    Equation 4.2

    Equation 4.2

    When processing digital signals, we will work with signals for which we do not know the precise values for N, An, fn, and mn

    Enjoying the preview?
    Page 1 of 1