YouTube Passes 1 Billion Auto-Captioned Videos

YouTube tries to automatically caption most videos using specially-built speech recognition technology with varying rates of success and accuracy, but the fact that the service now boasts over 1 billion videos with automatically generated captions is nothing to sneeze at. The service takes a while to caption each video, but the billion videos that it has managed to caption thus far represent a significant chunk of the 300 million or so new videos that crop up each year. Counting in the large amount of videos that aren't in languages that YouTube can caption, contain music without lyrics or with lyrics in an incompatible language, videos that YouTube Heroes decide to caption by hand, and the YouTube videos that have little to no speech, the estimated percent of coverage gets pretty impressive.

Those captions are not just spreading out to more videos, though; they're also getting more and more accurate each day. YouTube's auto captioning algorithm is based not only on speech recognition, but on machine learning. As it is fed more and more data, and the parts of it that can be improved manually are tweaked by experienced coders, YouTube's ability to automatically caption videos is getting better and better. This can be seen in the two example photos attached below, with one featuring an inaccurate caption that shows the system taking some time to catch up and getting a number of words wrong. For people who are deaf or hard of hearing, as well as people who normally tend to watch YouTube videos in environments where they either shouldn't have the sound on or can't hear it, this is a pretty big deal.

Automated captions do happen through machine learning magic, but they actually have a bit of help in learning and in making captions as accurate as possible. When a video has been automatically captioned, the video's uploader has a chance to review the captions. If a creator chooses to do so, they can help with correcting the captions, which in turn feeds the captioning bot more data about not only what different speech sounds like and what words it correlates to, but where its own flaws lie, what tendencies it needs to correct, and how best to improve.

Copyright ©2019 Android Headlines. All Rights Reserved
This post may contain affiliate links. See our privacy policy for more information.
You May Like These
More Like This:
About the Author
2018/10/Daniel-Fuller-2018.jpg

Daniel Fuller

Senior Staff Writer
Daniel has been writing for Android Headlines since 2015, and is one of the site's Senior Staff Writers. He's been living the Android life since 2010, and has been interested in technology of all sorts since childhood. His personal, educational and professional backgrounds in computer science, gaming, literature, and music leave him uniquely equipped to handle a wide range of news topics for the site. These include the likes of machine learning, voice assistants, AI technology development, and hot gaming news in the Android world. Contact him at [email protected]