top of page


Supplement for chapter 5

A simple Sonogram

A simple sonogram
00:00 / 00:04

A Whistle followed by the Voice

Whistlevoice from figure 5.2
00:00 / 00:04

See (hear) "Pitch Salience and Tone Duration" at Stanford "Correlogram Museum"

For more files also see

Similarity with a musical score

A sonogram is quite analogous to a musical score, though much more detailed. Both convey how the frequency content of the sound changes over time. A musical score showing that a trumpet should play a C3 for one beat will show just that note, whereas a sonogram will show that frequency and perhaps many harmonics of it, as appropriate to the overtone structure of the instrument. It will also show how long the note really lasts, whereas musical notation uses a shorthand. The musical score uses a logarithmic vertical scale, and a more or less linear time scale, although the shorthand notation for long vs. short notes make the time scale choppy.

We are familiar with a waveform, i.e. a record of pressure vs. time as recorded at a microphone. This record is time-like, i.e. it reports data at precise times. We are also familiar with power spectra, giving frequency content of the sound: how much power at precise frequencies, without reference to time.

Is there something in-between time-like and frequency-like? Indeed there is - the sonogram. Time and frequency cannot simultaneously be well defined: to establish that a sharp peak exists in the power spectrum of a signal, we must sample it and frequency analyze a long sample of the sound. On the other hand, if we know the amplitude of the sound only over a very short time, we do not have the data needed to analyze its frequency content with any precision.

We can however compromise and accept some uncertainty in time, and some uncertainty in frequency too. If the time uncertainty is small but not zero, the frequency uncertainty has to be large, and vise-versa. This is embodied in the Uncertainty Principle.

The figure shows three pulses of different durations; each is frequency analyzed for a time much longer that the pulse lasts. The first thing we realize is that the pulses die and where they are zero it doesn't matter whether we include those times or not. Thus the signal itself is limiting the time uncertainty. The Uncertainty Principle, given in Why You Hear What You Hear and essential to understanding the sonogram, comes jumping out of the power spectrum analysis of each of the three pulses. A short pulse gives a broader power spectrum, and vise-versa. The product of the two uncertainties (time and frequency) is a number that cannot be smaller than 1.

Now let's suppose we decompose the long signal at the bottom of teh figure to the left into small pieces, i.e. the little chirps (short pulses) seen floating above the signal. All these pulses, when added together, equal the long signal. Note that each pulse exists over some small range of time, and also suffers some frequency uncertainty; thus each pulse plots as a "blob" in a plot where time is along the bottom axis and frequency is the vertical axis. Yet we can see that each blob is where it should be, sitting at the right time and at a higher frequency if it is oscillating faster. Each of these blobs is the sonogram of the pulse or chirp, and the point is that the sonogram of the full signal is the sum of all the blobs. What you can take home from this is that any signal can be decomposed into chirps, and each of these chirps has a "blob sonogram"; the some of all the blobs is the sonogram of the full signal.

Now the question becomes, how long or short should the pulses be? (We have control of this in the window length that is chosen). If they are very long, they spread out along the time axis but are very thin, if they are short, they are thin in time but spread out in frequency. This is the question of window length taken up in Why You Hear What You Hear. Whatever window length (pulse duration) you choose, the sonograms of each pulse are the bricks from which the whole sonogram will be built. If your bricks are very thin horizontally but tall vertically, you cannot use them stacked together to create the sonogram of a long sound and expect to represent shapes that are thin horizontally. If you want that, you need a longer window length.

Short chirps

First, two short chirps used in the problem/projects book (see section "Problems and Projects").

00:00 / 00:01
00:00 / 00:06

Three short chirps (sine waves cut off smoothly, as seen in above) are given below. They all have the same center frequency, but the first is very short. See if you can discern the pitch; it will become more apparent in the next two pulses. This demonstration shows that the time-frequency uncertainty principle is physiologic too: as frequency detectors, we are subject to the same rules limiting the frequency resolution given a limited time duration of a signal.

Short pulse
00:00 / 00:01
Medium pulse
00:00 / 00:01
Long pulse
00:00 / 00:01

All three pulses center on 220 Hz. In the next figure, we see the three pulses processed with a window 2048 units long (sample rate 8 kHz), where the "blobs" seen in the sonogram are limited almost entirely by the pulses themselves, and to the right, a window length of only 64, where the window itself is imparting a large frequency uncertainty. The window length of 64 units at a sample rate of 8 kHz corresponds to a frequency uncertainty of 1/(64/8000) = 125 Hz, that is about what is seen at the right.

In the cross-platform Audacity, for example there is a tab for window length, measured in terms of individual data samples, going from 32 to 4096. Assuming the sound was sampled 44,100 times per second in the original recording (quite normal for audio recording equipment), then a window length of 4096 is approximately a sample length of just under 1/10 of a second. A sample length of 32 is well under a 1000th of a second. It is a useful experiment with this setting, (or the analogous one in similar software packages). Watch the changes in the sonogram for different kinds of sampled sound. Looking at the sonogram with several different time window widths in succession often exposes interesting features that any single window width cannot reveal. Sonogram Visible Speech has useful features such as autocorrelation, and nice graphics.

See (hear) also "Pitch Salience and Tone Duration " on the Acoustical Society of America's Auditory Demonstrations CD.


Which sonogram is which

Listen to this file and decide which sonogram belongs to which sound.
00:00 / 00:23
This is the answer
00:00 / 00:06
Latte sound
00:00 / 00:05
bottom of page