Behind Rosetta, Facebook's In-Image Text Recognition AI

Facebook has revealed some details as to the inner workings of Rosetta, a machine learning-based AI program that it uses to parse text inside images and understand that text in the context of the image. The program allows Facebook to offer more relevant image search results and make that sort of content more accessible for the visually impaired, among other use cases. Rosetta is not a singular AI program in and of itself, but rather a number of programs that work together to extract the text from an image, figure out how it relates to the image, and glean relevant insights when applicable.

The first step that Rosetta takes is using a convolutional neural network to figure out whether text is present in images, be it inside the image itself or in the form of text overlays commonly seen in image macros, motivational posters, and memes. This step uses what's called a region proposal network to create proposed regions to look for text in within an image, then checks those regions for known text patterns. Once text is found, optical character recognition comes into play so that the AI can figure out what the text is. The program uses sequence prediction alongside trained language and context processing in order to recognize words and phrases that may not have been seen in training. Finally, the understood text is run through a different program that's trained on context for in-image text blurbs.

The whole point of Rosetta is to look at text in images, in all forms, and figure out the relationship between them, if there is any. If you have, say, an image macro of an overweight cat sitting on the bumper of a truck with the warning label "Wide Load", the AI will be able to understand the association and make the joke easier to find via image searching. Likewise, if you have, for example, an image macro that says "When you're feeling down", and depicts somebody playing Katamari Damacy, the AI can recognize the link between the two and the fact that it's implied in the image that somebody who's feeling down is using the cheerful and nonsensical game as a way to cope with what's got them down.


Copyright ©2019 Android Headlines. All Rights Reserved
This post may contain affiliate links. See our privacy policy for more information.
You May Like These
More Like This:
About the Author

Daniel Fuller

Senior Staff Writer
Daniel has been writing for Android Headlines since 2015, and is one of the site's Senior Staff Writers. He's been living the Android life since 2010, and has been interested in technology of all sorts since childhood. His personal, educational and professional backgrounds in computer science, gaming, literature, and music leave him uniquely equipped to handle a wide range of news topics for the site. These include the likes of machine learning, voice assistants, AI technology development, and hot gaming news in the Android world. Contact him at [email protected]
Android Headlines We Are Hiring Apply Now