Google, Stanford Working On Medical Speech Recognition

Google Brain Stanford

Google staff and Stanford University researchers are working together on speech recognition technology targeted at the medical community, with the aim of alleviating some of the work that doctors and medical scribes are forced to do each day in regards to paperwork, note-taking, and other administrative tasks. According to Google’s data, the average doctor spends about 6 hours of their work day working with electronic health records, a phenomenon that can cause burnout among doctors. That major chunk of time, Google wagers, would be better spent on other tasks like interacting directly with patients.

The speech recognition tool that Google and Stanford are working on together is driven by Google’s own AI technology. The system, which Google is actively developing the backend for hand in hand with Stanford, is meant to listen in on conversations between a doctor and patient, and not only transcribe the conversation, but take relevant notes automatically, helping doctors and their teams to coordinate on patient care. The system is able to listen in on conversations and parse medical terminology by looking for key words, enabling it to figure out what parts of a conversation are worthy of special note. Conceivably, instances trained in partnership with or by individual doctors’ offices through the use of a tool could get to know regular patients well enough to figure out if any special variations apply to their case. With proper integration, this system could even enter information into health systems and populate patient charts.

The original incarnation of the AI was trained by Google and Stanford researchers together during its development, using a wide range of models. These training models fell largely across two spectrums; context-dependent models, which showed the AI how to learn and understand things in context, and sequence-to-sequence models, which are meant to give the AI more rigid training that falls in line with pattern recognition and repetition. In training of all types and across different voice archetypes, the AI managed to get its error rates to hover around an average of 20%, with some types of recognition and processing dipping well below that figure. With more training, including use in the field, this AI could quite easily achieve a high enough degree of accuracy to be considered reliable for mission-critical use cases.