“Personalized Hey Siri”

Apple Machine Learning Journal:

In addition to the speaker vectors, we also store on the phone the “Hey Siri” portion of their corresponding utterance waveforms. When improved transforms are deployed via an over-the-air update, each user profile can then be rebuilt using the stored audio.

The most Apple-like way to continuously improve that I can think of. More interesting, though, is this bit later on:

The network is trained using the speech vector as an input and the corresponding 1-hot vector for each speaker as a target.

To date, ‘personalized Hey Siri’ has meant “the system is trained to recognize only one voice.” That quote, though, sounds like they’re working on multiple-user support; which, with the HomePod, they really should be.

Leave a Reply Cancel reply