New AI From MIT Blends Speech & Object Recognition

MIT researchers have created a new AI program that's capable of not only recognizing objects in images and speech at the same time, but actively blends the two in order to understand and utilize both more effectively. This AI program can analyze images with audio captions, then put those two resources together and figure out what object corresponds to what parts of the caption. It demonstrates this in testing by highlighting image areas and objects while they're being described by the caption. According to the researchers behind the project, this presents a more natural and organic solution than conventional speech recognition or image recognition training. Essentially, the AI is learning like a human would, which could make it more flexible and thus more capable in the future.

This AI program is actually an expansion of a previous model that was able to match up words and phrases to themed collections of images, such as colors and archetypes. The model uses two convolutional neural networks that process speech input and image input separately, then a higher layer combines those and builds associations. Researchers showed the model both correct and incorrect associations in order to help it learn to discern connections, or lack thereof, on its own.

The implications of this project are quite vast and should be obvious; not only will this discovery allow faster speech recognition and image recognition AI training in future models, it will also pave the way for AI based on convolutional neural networks that not only mimic the human brain in structure, but also in learning methods. Theoretically, this opens the path to things like AI with common sense that may know that it's bad to drive a car off of a cliff, or AI that can recognize and react to human emotions appropriately, such as knowing that a crying child could be comforted by doing or saying something that a child that age would find funny. Improved AI-based translation is also a possibility here, since the AI in question could potentially learn words and their other-language counterparts at the same time, on the same material, even when there's not enough transcription of a language for conventional speech recognition or translation training.

Copyright ©2019 Android Headlines. All Rights Reserved
This post may contain affiliate links. See our privacy policy for more information.
You May Like These
More Like This:
About the Author

Daniel Fuller

Senior Staff Writer
Daniel has been writing for Android Headlines since 2015, and is one of the site's Senior Staff Writers. He's been living the Android life since 2010, and has been interested in technology of all sorts since childhood. His personal, educational and professional backgrounds in computer science, gaming, literature, and music leave him uniquely equipped to handle a wide range of news topics for the site. These include the likes of machine learning, voice assistants, AI technology development, and hot gaming news in the Android world. Contact him at [email protected]
Android Headlines We Are Hiring Apply Now