Google And Broad Institute Collaborate For Genomic Data

June 25, 2015 - Written By David Steele

Google currently operate thirteen data centres dotted around the world, with number fourteen being in the planning stage. These data centres form the infrastructure and the computing muscle behind the Google cloud computing platform, which can be used to store, process and manage massive quantities of data. For the scientific community, Google’s systems currently work to store large swathes of genomic data and Google Genomics has just announced a partnership with the Broad Institute, a Massachusetts-based biomedical and genomic research hub. This partnership is based around improving DNA analysis to help cure disease through personalized healthcare based on our individual DNA makeup. The project will require manipulating massive quantities of data to establish how various treatments impact on a specific DNA profile and it is here where Google’s cloud computing system can be applied. However, here Google’s processor muscle is being met with the Broad Institute’s scientific analysis. Let me put the “large quantities of data” into perspective: DNA sequencing, that is making sense of the As, Cs, Gs, and Ts of DNA, is already in the tens of petabytes and on track to reach exabytes in the not too distant future.

The President and Director of Broad Institute, Eric Lander, said this on the collaboration: “Large-scale genomic information is accelerating scientific progress in cancer, diabetes, psychiatric disorders and many other diseases. Storing, analyzing and managing these data is becoming a critical challenge for biomedical researchers.” Google and Broad Institute will work together on new tools “to propel biomedical research, using deep bioinformatics expertise, powerful analytics, and massive computing infrastructure.” The first joint project will involve bringing the Broad Institute’s Genome Analysis Toolkit (or GATK) to the Google Cloud Platform, where it will be offered as a service. The GATK software has been made available for some time and is free to academics and other non-profit users; over 20,000 users have used GATK to process genomic data. The Google Cloud-based version of the service will initially have limited availability but the longer term objective is to enable the platform for any genomic researcher.

Google is not the only cloud platform developer and we have seen other providers courting universities and academic facilities in order to work together. Microsoft recently announced Project Premonition, which uses smart traps and drones to capture and transport mosquitoes to help prevent future disease outbreaks. Part of Project Premonition is to identify diseases and create cloud-based databases of their research. Ultimately, the resources being poured into genomics should make a difference in improving human health. Google’s cloud platform will provide the scientific community with the tools to manipulate the data.