Google Teaches AIs To Describe Key Image Features

Advertisement
Advertisement

It's no secret that Google is heavily investing in the development of artificial intelligence (AI), and the Mountain View-based tech giant just made some more progress on that front. At this week's Computer Vision and Pattern Recognition conference in Las Vegas, Nevada, the company's scientists and researchers revealed what they've been up to recently. More specifically, they've demonstrated their accomplishments with computer vision systems which they've taught to describe key features of images. Among other things, their latest AIs are capable of determining the most important person in a video, detect and track specific body extremities, and—best of all—tell us about what they've "seen"!

Speaking of its extremities-detecting system, Google's researchers demonstrated it by presenting a system that detects legs of tigers which they've developed in collaboration with the University of Edinburgh. They've explained that the same technology can also be used for not only tracking individual parts of people's bodies but also machines and basically any other object. The implications here are potentially pretty huge as this system could theoretically analyze footage with the goal of singling out – say – people with weapons in their hands. Yes, this type of technology could also allow for some pretty effective spying, but from a technological standpoint, it's incredibly impressive.

The feature which determines "events and key actors" was developed by Google in collaboration with Stanford and creates the so-called "attention mask" for each frame of the analyzed video footage. After determining what's in focus during most of the frames, it can relatively reliably rate the relevance of each person and object present in the footage. That's much more impressive than playing Atari games, right?

Advertisement

Last but not least, Google's scientists presented an AI system capable of accurately describing images in great detail. While that accomplishment could potentially be useful to people with seeing disabilities, it has even greater implications when you look at it in opposite terms. For example, the fact that an AI can "see" and explain that there are three hats on the round coffee table is cool but the potential of you telling your futuristic robot companion to grab you the white hat from the round coffee table is potentially revolutionary. At the moment, the system Google presented in Las Vegas is capable of the former and hopefully, it'll soon make the latter a realistic possibility.