Why Do Smart Assistants Have a Tough Time with Questions


Steve Kelly

6/4/20223 min read

What we say matters. This is especially true to automated smart voice-assistants.

When we ask Siri, Alexa, Cortana, or Google Assistant for information or to complete a task, we sometimes find the result we weren't looking for. We stumble upon the right words to say or how to say what we said better. These devices are supposed to deliver on what they promise, but why can't they always get it right?

The voice information we feed to our smart device, gives these companies data to better improve their services over time. It's not always clear at first why Alexa can't answer a question like "How tall is a three-week-old elephant " or "How far can a squirrel run in a minute?". While you may be rolling your eyes at these types of trivial questions, it poses a fundamental problem for smart assistant devices: How creative are my voice-assisted smart speakers at producing results?

Of course, less trivial questions pose a problem for these voice assistants. If you've ever used one you have surely come across an issue with getting what you want from these beloved partners. Sometimes, the data just isn't there to produce a good, relevant result. Over time, the algorithms on the back end of these devices gather enough data from you and others' voices to understand what is being asked of these smart devices.

Amazon likes to test their version of smart assistant: Alexa, by running various tests:

  • Wake Word False Rejection Rate

  • Response Accuracy Rate

  • Wake Word Detection Delay

  • Wake Word False Alarm Rate

As you can see, Amazon and other manufacturers of smart devices likes to ensure the wake word is working properly through at least three tests dedicated to it. These tests are to ensure users get a good device, whether those smart devices are from Amazon or another manufacturer, they run through these tests. The above information is from Allion Labs, a testing lab for smart speakers that helps vendors with their product manufacturing.

Of course, the data that gets processed after the wake word is where we are interested the most. The following image may help visualize the process:

The data sent from the user gets sent to servers that use "neural networks" and "natural language processing" to give a result. A neural network is similar to how a human brain is connected. Neural networks are considered a black box, meaning, no one can truly know how the system comes up with it, except that it does. It's like magic, seriously. Natural language processing is similar to a neural network, except it works by processing words from voice input or otherwise and gives a result back. It took A LOT of sample data and classification to start creating these databases that form the backbone of these technologies.

Once your voice gets processed by the neural network and natural language processing, the smart speaker delivers an output that it believes is most accurate to what was said. That's how it works in a nutshell, the actual inner workings of these systems are locked away by the companies that make them. Wouldn't want to give all the trade secrets away, where's the fun in that?

When it comes to task completion errors, the smart device more than likely doesn't have the capability you are asking of it. With Amazon Alexa, there are applications built by users that can complete additional tasks not built-in to Alexa. These can be added through vocal input to Alexa or by going on the Amazon Alexa app on your smartphone and looking for applications to add. This can be done with other smart assistants as well. Over time more and more applications are being built, kind of like how software is built for our computers and smartphones.

We're still in the infancy of smart speakers, smart assistants, voice-activated smart devices, whatever you want to call them. Alexa, Siri, Cortana, Google Assistant, Yandex Alice all have a way to go before they grow up into fully functional question and answer university professors.