Sony's Visual Speech Enablement Can Read Your Lips Without Sound

Sony Logo 2018 AM AH 2

Sony’s Visual Speech Enablement allows for augmented lip reading in any environment, according to PCMag. Well, that isn’t creepy in the slightest. But in all seriousness, this could be a huge thing for accessibility in the world.

Mark Hanson, VP of Product Technology and Innovation at Sony, gave an overview of how the technology works at a CES keynote. By using Sony’s Intelligent Vision Image Sensor and AI the new technology is able to isolate a user’s lips and then translate the mouth’s movements into words.

The technology is able to do this without any background or foreground noise. Even more impressive is that it needs no microphone at all to work. In addition to this, the distance between the user and the sensor can be over many feet. With the use of a higher-resolution sensor the user does not have to be super close to their target according to Hanson.


Sony’s initial plans are to market the technology for a few different cases. For example, use as factory automation, kiosks, and voice-enabled ATMs. The only optimization for computers is available now at the moment. However, consumer-facing versions of the feature could be available on mobile hardware in the future.

Although, the technology has plenty of usage scenarios, Hanson says that it is not yet optimized for certain cases. For example, improving auto-generated options, reducing the need for a relay operator or automated speech are some of these cases.

Sony’s Visual Speech Enablement is cool but could be a privacy nightmare

Of course anytime new technology gets introduced it is cool. However, something along these lines could be a privacy nightmare. Facial recognition has already had its own issues with privacy concerns and misuse. Imagine a scenario where a facial recognition camera is paired with Sony’s Visual Speech Enablement technology.


This could be an issue. Capturing facial profiles along with what they say could help with security or destroy privacy. The technology only captures lips and not faces. The technology does not retail user-identifiable data. This is according to Hanson.

However, the outcome of combining this tech with other technologies such as facial recognition went unaddressed. So the before mentioned scenario is possible. In addition, a lot of technologies like facial recognition use cameras and may be able to incorporate Sony’s AI-enhanced sensors.

Facial recognition technology is already used in some cities despite some objections. Of course in this day and age, few things are truly private anymore. Websites track users via cookies, some internet providers and mobile carriers sell our data. Only time will tell when and how Sony’s Visual Speech Enablement is integrated into our society.