One of the most popular tropes in sci-fi—be it James Bond, Knightrider, what have you—is the intelligent car sidekick. It can drive itself, see its surroundings, carry on conversations. With advances in autonomous vehicle and sensor technologies, engineers are every day bringing us closer to this science fiction reality. But when it comes to certain elements, such as intelligent navigation and conversation, the real world has fallen woefully behind. Until, that is, CMU-SV professor Ian Lane created Capio.
“We’re taking this technology to a level that mimics human capacity,” says Lane, “enabling people to interact with a machine in the same way they would interact with a human.”
We’re taking this technology to a level that mimics human capacity, enabling people to interact with a machine in the same way they would interact with a human.Ian Lane, Assistant Research Professor, Carnegie Mellon University Silicon Valley
Outwardly, Capio doesn’t look like much—a simple black bar mounted on the car’s dash, equipped with a camera and audio sensors. But despite its humble appearance, Lane’s technology solves all of the problems of former systems. Using computer vision-based approaches, the system can track the movements and gestures of every face in the car. This way, the car can follow the conversation in the same way a human can, telling the difference between when the passengers are talking to each other, and when they’re addressing Capio directly.
Typically, in order to access GPS directions or find nearby restaurants, drivers rely on their smartphones to tell them where to go. But this can present a number of problems. Drivers looking down or typing on their phones while driving is a serious safety concern, and one of the leading causes of fatal accidents in the U.S. Some newer vehicles have voice-activated GPS systems installed directly into the car, but if there is any background noise in the car, these voice-activated systems break down.
“Though these systems have improved dramatically over the last few years—Siri and Alexa, for instance—there are still challenges,” says Lane. “Often, these systems fail when there are many people speaking at the same time. For instance, if you’re driving in your car and the kids are screaming in the back seat, the system doesn’t work. We’ve developed technology that understands individual speakers, so even if there are three or four people speaking at the same time, the system can pick out one person from that speech and recognize them with high accuracy.”
We’ve developed technology that understands individual speakers.Ian Lane, Assistant Research Professor, Carnegie Mellon University Silicon Valley
Second, not only can the system follow conversation like a human does, it can also learn in the same way as a human. Using deep learning systems, Capio is able to build upon its ability to pick out individual voices in order to get better and more accurate through its interactions with people over time—the same way that children learn to pick their parents’ voices out of a crowd.
This hands-free, contextually aware, human-computer interaction system not only solves the problem of drivers using their phones while on the
Recently, Lane was invited to the Tokyo Motorshow on October 25,
“These car systems are just the beginning for Capio,” Lane says. “The future of contextually aware, human-computer interaction systems will mean that eventually, every interaction we have with the machines we encounter on a daily basis will feel just as comfortable as speaking to a person.”