Journey through the land of waveforms and spectrograms in the latest video in our sounds of language series. In this installment, our Research Director shows us how a few nifty tools can help us visualize the sounds that make up a language.
TRANSCRIPT:
Alright. So now that we know about the phonemes of all the languages in the world the interesting question is, can we see them? Can we visualize sound? And the answer is, a resounding yes! You are actually looking at it. This is a program that allows me to see the waveform which is on the X-axis is time and on the Y-axis is the energy in decibels, or how loud I am. So if I speak really loud and I can make these giant patterns, but actually that is not the most interesting view - that would be the spectrogram, which is on the X-axis time, then on the Y-axis now we have the frequencies, and it’s color coded, so that the yellow indicates the highest energy band, and then it goes to red and to dark.
So I can make some different sounds, like the vowels, you can see they’re quite distinct. The highest frequency is different, the second highest frequency is also different. The amount, the harmonics, the space between the formants is also different. If we do some fricatives, you can see that I was indeed mentioning that it was turbulence, it was noise. For the fricatives, pretty much the energy is spread out across all the frequency bands. With applosives, you can see that there is this burst of energy. There is like nothing for a few milliseconds, and then there is this explosion of energy.
So this is actually how speech recognition works. Digital Speech Processing, fast Fourier transforms are all techniques to gather the audio and then break it into this frequency bands and analyze the pattern, and then match it against a known, acoustic model that tells you, oh this sounds like a F or it sounds like a P, etc. There are actually scientists that can read the spectrogram. They can look at a picture like this one and can kind of reconstruct everything that I’ve said so far.