An Introduction to Audio Signal Processing

Last Edited: August 2, 2025

After researching a bit into colleges for college application purposes, I learned that some schools (e.g. The University of Rochester or CMU) offer audio engineering as an actual major. It turns out that modern music technology is quite technical, so I thought it would be worth looking into how this works.

Overview

Audio signal processing is the science of working with sound in its analog or digital forms. Common tasks largely consist of filtering, compression, time-based effects, spectral processing, and synthesis. In this post, I’ll be covering filtering, time-based effects, and synthesis.


Filtering

Filtering is about changing the frequency content of the sound. Audio engineers can create a more balanced sound by adjusting the amplitude of specific frequencies. This is the key idea behind equalization (EQ), which is essentially a group of different audio filters. A low-pass filter passes low frequencies while blocking high frequencies, and vice versa for a high-pass filter. A band-pass filter passes a group of frequencies (also referred to as a band). A band-stop filter does the opposite, by passing frequencies outside the band. You get the idea, but how are these built?

At a technical level, audio filters are implemented using mathematical algorithms that operate on the waveform data of the samples. These algorithms are derived from concepts in signal processing and can be broadly categorized as IIR (Infinite Impulse Response) and FIR (Finite Impulse Response) filters. IIR filters use feedback to shape the output based on both current and previous inputs and outputs. They’re efficient and commonly used in real-time audio applications like parametric EQs. FIR filters use only past input values and no feedback. They offer precise control over frequency response and phase, but at the cost of higher computational demand. In both cases, filters require filter coefficients to work properly, which are just a set of coefficients multiplied with the audio sample. Low-pass filters would require one set of coefficients, and band-pass filters would require a different set. They are calculated based on cutoff frequency, Q (resonance), and gain. The cutoff frequency determines when the drop happens, the Q refers to how narrow or sharp the drop is, and the gain refers to how much to boost or cut.


Time-Based Effects

Time-based effects modify the perception of space and time in sound. Delay would repeat the audio after a set time, like an echo. In digital delay, the original sound goes through an analog-to-digital converter, passes through a memory buffer, and mixes with the live audio. Reverb simulates the reflections of sound in a room. It is quite similar to delay, but with multiple delay lines with fading trails.

On top of these two, there are other time-based effects that rely on changing the timing of a sound in subtle ways to create texture. Chorus creates the illusion of multiple instruments playing at once. It works by duplicating the original audio signal, slightly delaying it, and then modulating that delay over time with a low-frequency oscillator (LFO). The result is a lush, shimmering sound—like the difference between a single voice and a choir. Flanger is similar to chorus but has a shorter delay time, usually around 15 milliseconds. The two signals interfere with each other and create a “jet plane” sound. Phaser doesn’t delay the sound after duplicating it; instead, it shifts the phase of certain frequencies and creates a whooshing sound.


Audio Synthesis and Physical Modeling

In audio synthesis, which is the electronic generation of sound, audio engineers work towards synthesizing certain sounds. Physical models are used to simulate the behavior of physical sound sources. It effectively replaces the process of recording samples, and instead relies on generating real-time sounds based on mathematical representations. Consider the virtual instrument, for example. A traditional approach would require recording every possible variation of sound from a piano, depending on different dynamic levels, then playing back the sample depending on how hard you press the key. But sampling has its limits because recording every nuance is storage-heavy.

In physical modeling, the mechanics of sound production come to light; they offer a simulation of the internal state of the instrument, and the internal state is automatically maintained. This allows for state retention in physical models, just like in real instruments. This includes continual vibration in virtual strings and subtle articulations like vibrato.


Some links: https://www.wamplerpedals.com/blog/talking-about-gear/2019/03/what-is-the-difference-between-chorus-flanger-and-phaser/
https://en.wikipedia.org/wiki/Audio_equalizer
https://ccrma.stanford.edu/~jos/pasp/pasp.html