Languages Magazine

Command-&-Control Vs. Large Vocabulary Systems in Speech Recognition/NLP

By Expectlabs @ExpectLabs
image

You speak into a microphone; the audio frequencies are processed by a speech recognition module; you receive the desired output. How deeply did the computer understand the meaning behind your words? It depends what system was being used: command-and-control or large vocabulary.

A paper from Microsoft Research defines command-and-control speech recognition as “interacting with a system by speaking commands or asking questions restricted to a fixed grammar containing pre-defined phrases.” Command-and-control speech recognition could be used by an IVR solution that asks a caller to state their zip-code, then forwards the call to a regional representative. Google’s recently-introduced custom voice actions, which enable three “OK Google” trigger phrases for vetted third-party apps, are another example of command-and-control. Internet-of-things solutions with limited vocabularies, like a voice-controlled smoke alarm, sometimes work this way as well.

Command-and-control speech recognition has been around for a while. In the late ‘90s, the technology was considered reliable enough to be implemented broadly in corporate call centers. Accented speech, poor diction, background noise, or an unanticipated response could throw the system off, but at least there was frequently the option to press “0″ a bunch of times to get a human on the line.

While late-’90s technology was arguably OK at triggering actions based on pre-determined vocal cues, the understanding of meaning has been another challenge entirely, requiring great advances in the field of artificial intelligence to unlock. It has really only been in the last five years that machines have begun to understand free, natural speech in a meaningful way.

Due to their much wider base of lexical knowledge, these newer technologies are called “large vocabulary systems.” MindMeld is an example. Not only can these systems be familiar with millions of concepts instead of mere hundreds, but they also understand how multiple concepts interrelate in queries they’ve never been trained on. This opens up new possibilities for using voice to navigate large databases of movies, merchandise, recipes, and businesses in an unrehearsed, conversational way.


Back to Featured Articles on Logo Paperblog