Maintaining Google and all of its 15 data centers scattered around the globe – from Tennessee to Taiwan – and keeping them running 24 hours per day all year long is no mean feat. One misstep, a couple of server failures occurring at the wrong time would see a good portion of the company’s software and services go offline. As the issue was so sensitive that it didn’t allow for major mistakes, Google used to utilize data center and other kinds of automation for the better part of its existence. That’s basically a carefully designed mixture of hardware and software made to automatically handle and resolve menial and repetitive tasks. Tasks that Google’s usually highly trained and well-paid employees are basically wasting their valuable time on.
The concept of data center automation has now been tackled by the company’s latest book “Site Reliability Engineering: How Google Runs Production Systems.” The site reliability engineering (SRE) from the title is a name of a technique of handling general operations like software problems. The book goes to great depths to explain how Google’s employees protect, provide for, and progress hardware and software which runs the company’s public services such as Android, Search, Gmail, and YouTube. It’s written for people interested in learning about maintaining availability, performance, latency, and capacity of huge public services, all while simultaneously improving them.
As it’s explained in the book, the ultimate goal of data center and other automation processes is enabling any Google employee in the world to perform the most complex actions and operations of any given data center without being aware of all the little details about the system they’re modifying. In places where it makes sense, Google is doing its best to completely dehumanize its processes, though all of its systems are modified with careful consideration about a potential need for “a bit of humanity.” So, if you’re interested in learning more about how one of the largest and most innovative tech giants on the planet runs its operations, the SRE book that you can access for free by following the source link below is definitely a good read.