Follow me on Twitter @KanthalaRaghu

How Amazon Alexa Works ?

The Echo units have two main "modes." The first is a small firmware chip wired to the microphone that only contains about 50-60k of onboard memory. Its only purpose is to listen to the wake word, "Alexa," "Echo," etc.

Disclaimer: I have several Internet friends who work for the Alexa division at Amazon, and much of the workings of the Alexa/Echo devices are public knowledge if you are a skills developer or connected home, etc. tech partner so I'm not really revealing any major secrets here.

"Amazon Alexa is what Siri is to Apple." 

The virtual assistant program can't be bought on it's own, but can be called upon when using Amazon devices, like an Echo, Dot, or Tap. It's now even compatible with some third party vendors. Alexa reportedly mimics conversations and analyzes voice commands. A list of handy tasks that the lifelike assistant can help you with. This includes "playing music," and controlling your vacuum to reading a Kindle book to you, paying your bills with an app and ordering pretty much anything you want online. '

The Echo units have two main "modes." The first is a small firmware chip wired to the microphone that only contains about 50-60k of onboard memory. Its only purpose is to listen to the wake word, "Alexa," "Echo," etc. It doesn't do any actual language processing for this, but only listens for distinct combinations of syllables. This is why they can't be programmed to respond to arbitrary words.

Once the firmware chip hears the wake word, it powers up the main ARM chip, which runs a stripped down version of Linux. This startup process takes just under a second, during which time the firmware chip has barely enough memory to buffer what you're saying if you immediately start talking after the wake word without pausing. Once the ARM chip is on, the blue ring on the top illuminates and recording begins. The firmware chip dumps its buffer to the start of the recording and then serves as a pass-through for the mic. Only this main ARM chip and OS has access to the networking interface, in or out.

The purpose of this next stage is to wait until it's heard what sounds like a real natural sentence or question. Amazon is not interested in background noise -- that would be a waste of bandwidth and resources. So there is a rudimentary natural language processing step done locally to determine when you've said a real sentence and stopped speaking. It also handles very simple "local" commands that don't need server processing, like "Alexa stop." Only at that point is the full sentence sent up to the actual AWS servers for processing.

It is physically impossible for the device to be secretly constantly listening, as the mic, networking, main wake chip, blue LED ring, and main ARM chip just aren't wired that way from a power perspective. If you are curious to confirm any of the above, try disconnecting your home internet and playing around with the Alexa a bit, and you'll see that it only even realizes something is wrong at that very last step, when it goes to upload the processed sentence to the servers.

As for the stories about "eerie" advertising coincidences popping up due to things you've said around Alexa, it just goes to show how spooky accurate advertisers' overall profiles are of you these days. They can track everything you have done across every device you own, and then make such educated guesses about what you're probably interested in that they don't even need to listen in your home.

Post a Comment