How Google Found Its Voice

By Deputy Chief EditorSeptember 18, 2019

136 3 minutes read

A few years back, Google was actively exploring whether it should launch a male counterpart to Amazons female Alexa voice assistant. “When we first launched the Google Assistant, we intended to use a male voice, just to be different,” recalled Google Assistant product manager Brant Ward recently.

However, at the time, text-to-speech technology was still struggling to make male voices sound natural. “Youd get these warble effects, oftentimes, with male voices,” said Ward. This forced the company to ultimately go with a female voice for the Assistants launch in 2016.

Fast forward 3 years, and the Assistant offers U.S. consumers to choose among 11 male and female voices, with Google announcing this week that it is bringing additional voice choices to 9 more languages.

Much of that has been made possible by rapid advancements in cutting-edge artificial intelligence. But the way Google presents itself through voice interfaces has just as much to do with early design decisions related to the Assistants name, personality and more. Ward and Google Assistant personality character lead Emma Coats recently talked to Variety to explain how Google found its voice.

Does software have a gender?

Early on, the team working on the Google Assistant had to figure out a key question: What is Googles persona? Looking at existing Google products like Gmail and Chrome offered few clues. “You dont see a lot of personality in these products,” recalled Coats.

Without clear guidance from existing products, the Assistant team decided to map out some potential choices. “There were two schools of thinking,” Coats said. One was to turn the Assistant into a kind of audible version of Google.com, a matter-of-fact oracle that spits out knowledge whenever you ask for it. The other was to create more of a character. Make it helpful, but also a bit playful.

Coats team developed 20 hypothetical questions consumers might ask the Assistant, and the answers these 2 types of personalities might respond with. All of the questions and answers were hung up in a conference room, and key stakeholders were asked to mark the ones they preferred with little dot stickers.

The result of this dot-voting process was that the character won over the oracle — the right choice, if you ask Coats. After all, if you search for “Hello” on Google.com, the first result is a video of the Adele song. Said Coats: “Is that really what you want when you are speaking to something?”

But while the Assistant was supposed to have personality, it was also clear early on that consumers shouldnt confuse it with an actual person. “It should speak like a human, but it should never pretend to be one,” said Coats.

In addition to personality traits, the Assistant team also had to settle on a range of other issues, recalled Coats: “What are the implications of giving the Assistant a human name? Does software have a gender?”

In order to help the assistant be more approachable in a variety of cultures, the team ultimately decided against giving it a human name, and instead settled on the “Google Assistant” moniker. “We did want to make it feel like a conversation with Google,” she said. This also helped to avoid that consumers would associate the Assistant with a single gender, ultimately paving the way for Google to roll out additional voices.

When the Assistant sounds like a ransom note

When Google initially developed the Assistant, it was still relying on traditional text-to-speech technology. This required to record a lot of source material with a voice actor, which was then chopped up, and reassembled by Googles algorithms to create words and sentences. “Its kind of like a ransom note,” said Ward.

That system worked reasonably well for common words and phrases, but would trip up a lot on edge cases. “It would sound really choppy,” he recalled. “Aberrations are hard.”

Google achieved a breakthrough when it replaced its traditional text-to-speech model with a deep learning-based approach called Wavenet in 2017. Curious minds will find more on the way Wavenet works on Googles Deepmind blog, but in essence, the algorithm generates sounds from scratch after having received enough training from voice samples.

This not only resulted in a lot more natural-sounding Google Assistant, it also made it significantly easier for Google to develop and deploy new voices to the Assistant. “We can build more voices in less time,” Ward explained.

With the help of Wavenet, Google has been able to launch 11 voices total in the U.S. market, and many more internationally. The company was even able Original Article

By Deputy Chief EditorSeptember 18, 2019

136 3 minutes read