Google's Tacotron Is An Advanced Text-To-Speech AI


Artificial intelligence programs that are made to transcribe text into speech instead of using more rigid hand-coded programs are not an entirely new thing, but Google's newest, called Tacotron, is one of the most advanced and forgiving of the bunch. Tacotron boasts the ability to dynamically control shifts in pitch and tone based on context, including prosody, the tendency of certain speech to resemble singing. The kicker is in just how smart Tacotron is about exactly what it's reading; for example, it can not only account for some seriously bad typos without missing a beat, but can figure out which variant of a word it's looking at when it deals with the same word pronounced differently in different context, or words that look alike but sound different.

Tacotron's list of talents is quite wide. It's sensitive to punctuation and capitalization for emphasis, and can figure out how a sentence is supposed to sound based on both those qualifiers and context clues in the sentence and surrounding phrases. Thanks to specialized code made to truncate output strings through calls to related code, Tacotron is not only smarter than existing text-to-speech programs like GRU and seq2seq, but is also faster in most scenarios. The headline feature of Tacotron is the ability to handle words it's never seen before by sounding them out in a similar fashion to children who are encountering words in text form for the first time. Tacotron can even handle spelling errors, making it read seamlessly over typos or even mostly incoherent blobs of text as if there's nothing wrong with them.

While Tacotron is quite advanced as far as text-to-speech AI programs based on machine learning go, Google readily admits that Tacotron doesn't sound quite as natural as text-to-speech engines that piece together recordings of humans talking just yet. While that facet could improve with time, the primary reasons for research into Tacotron is that it's cheaper, less time-consuming, and far more flexible to implement than a pre-recorded speech synthesizer. The white paper for Tacotron, along with a few audio samples, is available through the source link on Github, though Tacotron is currently not open-source.


Share this page

Copyright ©2017 Android Headlines. All Rights Reserved.

This post may contain affiliate links. See our privacy policy for more information.
Senior Staff Writer

Daniel has been writing for Android Headlines since 2015, and is one of the site's Senior Staff Writers. He's been living the Android life since 2010, and has been interested in technology of all sorts since childhood. His personal, educational and professional backgrounds in computer science, gaming, literature, and music leave him uniquely equipped to handle a wide range of news topics for the site. These include the likes of machine learning, Voice assistants, AI technology development news in the Android world. Contact him at [email protected]

View Comments