Amazon has improved its Alexa AI helper with new solutions, fixes, and features that range from functionality and gadgets to partnerships and integrations but is now aiming to improve things underneath the surface, the company says.
Amazon hopes to build that out similarly to how optimizations were designed to enable Alexa Guard — a service introduced in 2018 that can actively listen for smoke alarms, carbon monoxide detectors, or the sounds of a possible break-in. The shipping giant wants to make the AI itself even better at listening and learning what to listen for, to begin with.
The goal is to ensure that as the AI is teaching itself about sounds, it is continuously getting better and disregarding learning that will make it perform worse. The team behind Alexa plans to present the improvements at this year's annual International Conference on Acoustics, Speech, and Signal Processing.
Stopping self-trained errors
There are two new methods the company is using to accomplish that task. The first, Amazon uses what it refers to as "semi-supervised learning," utilizing the standard large data sets augmented with more controlled examples. Similar but varying data sets are used across no fewer than three different AI models and then pools the outputs.
The same method applies to audio via the inclusion of longer audio streams versus only using short bits of annotated audio.
That, of course, makes the final outputs more accurate for accomplishing the task Alexa is meant to perform but it also stops a recurrent problem that isn't atypical in machine learning. Namely, that appears to prevent the learning model from teaching itself errors, causing the problem with error-laden outputs to diminish rather than being amplified.
Not just for break-in detection or safety
None of that will necessarily put an end to other concerns about Alexa, particularly those associated with the company inadvertently or deliberately listening in, but it should improve far more than listening mechanisms used for less common sounds. Perhaps more importantly, the methods apply to the detection of audio before anything resembling an answer is even required.
For instance, a user might be in a high-volume or noisy environment — or just watching tv. Amazon has had issues in the past with Alexa accidentally responding to its name in those circumstances such as providing undesired responses during a commercial for an Echo device.
By increasing the length of audio the AI is learning from and letting it listen for longer, the team is able to help it learn to discern between media that is being played and the sound of a user asking for assistance. The improvements that can bring to event detection should, as outlined in the examples above, be particularly useful when it comes to spreading Alexa more widely.
Amazon's bid to enter the smart earbud market with its own mobile accessory provides an example of at least one case where better detection is bound to prove beneficial. The earbuds will likely be worn by users under a vast array of environmental conditions. So ensuring the audio playback isn't frequently interrupted by unwanted Alexa interactions will be a vital factor in the gadget's success or failure.