The fact that computers have the ability to turn minuscule changes in air pressure into text is an astounding achievement. Understanding the elements that make up sound is the first step in learning how this achievement, called speech recognition, works. The image above is called a spectrogram, which is a graphical representation of sound based on its frequency, intensity, duration, and resonance. Spectrograms are created by dividing sounds into segments, called frames, which are each about 20 milliseconds long. This illustrates the boundaries between phonemes, with the colors indicating the sound energy at specific times and frequencies. An MIT professor named Victor Zue is famous for being able to read these graphs, and even teaches courses on how to properly read a spectrogram. Reading spectrograms is not an easy task, and involves being able to interpret the acoustic patterns being displayed to determine what exactly is being said. Can you guess what the spectrogram above says? The green waveform should give you a hint. If you need help, the answer is pasted at the end of this post.
Here at Expect Labs, we use spectrograms to analyze the phonetic sounds that are produced while using MindMeld, our voice-calling app for the iPad. Spectograms enable us to analyze the components of vocal outputs, which are transformed into written words once algorithms are applied to make sense of the noise. However, speech recognition is only a small aspect of what our Anticipatory Computing Engine does. Our platform also understands multiple streams of dialogue in real-time, identifies key concepts and related topics, and uses language structure and analysis to infer what types of information users find useful.
We will attempt to reveal more about how our technology works in future blog posts. Please let us know if you have any questions about what we’re working on.
¡plǝɯpuıɯ ɟo sɹoʇɐǝɹɔ ǝɥʇ ‘sqɐl ʇɔǝdxǝ oʇ ǝɯoɔlǝʍ :ɹǝʍsuɐ