The Future of Voice: What’s Next After Siri, Alexa and Ok Google

Voice-driven technology just isn’t what it used to be… and that’s a good thing! Check out this recent article from MindMeld CEO Tim Tuttle:

Ever since the dawn of computing technology in the 1950s, scientists and consumers alike have dreamed of bridging the gap between man and machine with natural spoken language. As machines began to outperform humans in complex calculation-based tasks, it became frustrating that they should lag so far behind in understanding language, that most basic building block that separates us from other animals, particularly when our own species’ infants pick up language quickly and instinctively.

Despite scientists dedicating their lives to the challenge over several decades, until recently, only very slow progress had been made in teaching machines to understand spoken language at all, let alone with human-level proficiency.

The first significant advances came in speech recognition, the ability to convert sound waves into text representing spoken words. Advances in speech recognition long predated the ability to understand meaning. By the ’90s, speech recognition was sufficient to power automated corporate call centers across the globe, representing the first time speech technology stepped out of the research laboratory and into the world of business.

While speech recognition capabilities were sufficient to power menu-driven, command-and-control IVR (“interactive voice response”) phone systems, speech technology has traditionally fallen short of bringing to life that science-fiction dream of speaking conversationally to a machine and having it genuinely understand your intent. Command-and-control systems with set inputs and preprogrammed responses are like a dog that can “fetch” or “roll over.” By contrast, a large-vocabulary system with natural language understanding (NLU) is humanlike: Flexible, consistently learning and responsive to millions of statements and queries it’s hearing for the very first time.

Conversational Interfaces: Why Now?

The first generation of virtual personal assistants was conceived in response to improved speech recognition, faster wireless speeds, the cloud computing boom and a new type of consumer: The hyper-connected smartphone user, navigating a busy life, often on the go and eager to abandon the slow clumsiness of virtual keyboard input. Initially capturing the public’s fascination with a roar of media buzz, the realities of the technology soon fell short of high user expectations.

About five years later, another perfect storm of market conditions is brewing for a second wave of virtual personal assistants and conversational interfaces, exceeding the first in both intelligence and pervasiveness. This new wave of voice-driven assistant technologies rides on the back of advances in artificial intelligence, rich collections of user data and growth in keyboardless and screenless devices. Additionally, great speech recognition is now built into every major operating system. Google, Apple, Baidu, Microsoft and Amazon provide this capability for free, enabling a new generation of apps to drive user adoption.

The new wave of voice-driven assistants finally embodies the dreams scientists and consumers have held for so long, legitimately understanding the meaning — and delivering on the intent — of naturally spoken queries. Older assistants hinted at what was to come, but relied on a fragile illusion of conversational ability: Sometimes customer queries were grouped into ill-fitting categories that triggered scripted responses. Other times, optimal responses were excluded when they lacked a required keyword. This was problematic because real human conversation isn’t rigid; it is expansive, encompassing millions of different concepts and word configurations. The new wave of voice-driven technology meets this challenge head-on, interpreting and responding to queries with genuine intelligence.

Read the rest in Re/Code!