Lessons at scale—hyperscale

No, we don’t all operate hyperscale data centers. But we can learn some lessons from the way hyperscales operate their facilities.

Carrie Goetz

April 20, 2023

6 min read

The term “hyperscale” describes systems and technology architectures that can grow, or scale, on demand to a massive size or volume. But, obviously, not all data centers are hyperscale in size, power consumption, equipment use, configuration, or any number of other factors. So, as operators of smaller-scale data center facilities, should we completely ignore these hyperscales because their operations are so different from ours? Many lessons can be learned from hyperscale operators, and while not all are applicable at that scale, some certainly are worth following. Efficiency, security, innovation, sustainability, elasticity, automation, and survivability are a few areas worth examining. If it can scale up, then certainly, we can apply some of these concepts at smaller scales.

Elasticity is the ability to add resources in near real-time, and it is the cornerstone of hyperscale architectures. Hyperscalers operate at the speed of business, and deploying assets on the data center floor is no exception. While not all companies can simply install equipment on the chance it will be needed sometime in the future, they certainly can take advantage of a hyperscaler’s asset expenditures. One of the benefits of the public cloud is the bill's ability to return to zero. Companies can spin up instances as needed in the public cloud and either let them expire at the end of the need, keep them in the cloud, or repatriate them when they get their internal commissioning accomplished. Companies can shop for the most cost-effective instances in today’s multicloud environment, where users avail themselves of services from multiple cloud providers.

Should a company adopt internal hardware in an elastic fashion, an efficiency lesson learned from the hyperscalers is that even the most trivial power draw without purpose is a waste. Hardware evaluation is important. An understanding of the power needed for computing actions is helpful. Barring the use of modeling software ahead of time, enterprises grapple with the reconciliation of power used versus compute cycles. The metric Power Usage Effectiveness (PUE) is good for assessing the entire facility, but doesn’t granularly address server, networking gear, or storage hardware. Blindly following a manufacturer “just because” may not yield the most cost-effective or sustainable solution. Hyperscalers have various resources, including hardware design resources, to ensure they have robust, energy-efficient hardware. The lesson here is less is more. Less means making sure to eke out every bit of efficiency. Evaluate hardware platforms continuously for the best actionable data.

A side lesson that applies to everyone involves maintaining spares for critical equipment. In many instances, should a server fail, a hyperscaler removes the entire rack and replaces it, doing postmortem and triage later. Supply chain issues of late drove home the need for parts and available new products. Hyperscalers have greater control of their supply chain than many organizations. A nice balance for other companies is the lesson of planning. While maintaining spares, rapid deployment, and supply chain control are all critical, many companies can achieve like timeframes by picking suppliers that are business partners, not just parts suppliers. Frequent vendor communication can pinpoint potential project logistics pitfalls before they become issues.

Another lesson to take away is innovation, which is an area where hyperscalers shine. Although often stealthy during trials, hyperscalers are generous with knowledge transfer, both for project outcomes with great results and those with less-than-stellar outcomes. M2M (machine-to-machine) learning, artificial intelligence, Christian Belady’s data center in a tent, and even cloud computing are all examples of innovations that have provided direction to the industry.

"While I am not suggesting that this is what the data center of the future should look like. ... I think this experiment illustrates the opportunities that a less conservative approach to environmental standards might generate," Belady wrote in a prophetic blog.

Years ago, the data center in a tent led to countless dollars and electrons saved across the industry as Belady proved that computers operate well in warmer data centers.

Innovation happens with hyperscalers aided in large part by the diversity within their workforces. The more minds with varied experiences that come together, the better the outcome. True innovation happens when people come together with various ideas. Necessity is the mother of invention, as they say. Sometimes the person with the least knowledge of a project asks just the right questions to shine a light on a solution simply by challenging assumptions.

Removing the fear of failure also helps innovation. Larger companies and hyperscalers have incubators just for trying new things and, of course, they have the pockets to fund innovation projects. Live mission-critical environments don’t have a “just-try-it” policy. Having test labs and resources is undoubtedly beneficial. Following hyperscalers’ innovations and organizational activities can be telling. Products only sell if there are buyers. Following them can give you an idea of what might be on your horizon, albeit at a significantly smaller scale.

Security is another lesson we can learn from hyperscalers. They certainly have multiple layers, which allows them to say the cloud is secure, but a company is responsible for what it places in the cloud. Study after study shows that companies still struggle with cloud configurations. Most misconfigurations go unnoticed until there is a breach. The trust-no-one approach that hyperscalers adopt for their own infrastructure is not a bad idea for the occupants. It goes against human nature, but it works.

Resiliency is certainly something everyone can learn. Hyperscalers maintain the near-perfect storm of resiliency and redundancy for maximum uptime. Resiliency is aided by automation. Automation helps remove the likelihood of human error, enhances uptime, and improves response times. Automation software can handle many tasks, from orchestration to reboots, and removes much of the risk of inadvertent errors. Data centers can automate backups and storage activities, provision networks, maneuver workloads, monitor, and handle administrative tasks. Automation benefits the facility by enhancing work volumes. Automated tasks happen in a fraction of the response time of a human-completed task. Automation can save money by orchestrating workloads to off-peak or lower power-cost sites during peak periods.

In short, while you may not be a hyperscaler, you can certainly learn from them and apply those lessons at your scale.

About the Author

Carrie Goetz

Carrie Goetz, Principal/CTO, StrategITcom, and Amazon best-selling author of “Jumpstart Your Career in Data Centers Featuring Careers for Women, Trades, and Vets in Tech and Data Centers,” personifies over 40 years of global experience designing, running, and auditing, data centers, IT departments, and intelligent buildings.

Carrie is fractional CTO to multiple companies. She is an international keynote speaker and is published in 69 countries in over 250 publications. She holds an honorary doctorate in Mission Critical Operations, RCDD/NTS, PSP, CNID, CDCP, CSM-Agile, AWS CCP and is a Master Infrastructure Masonwith 40+ certifications throughout her career. She served on theWIMCO national education committee and is a long-time participant in 7×24 Exchange, AFCOM and Data Center Institute board of advisors, Mission Critical Advisory Board, Women in Data Centers, CNet Technical Curriculum AdvisoryBoard, NEDAS Advisory Board, a member of BICSI, Women in BICSI, and an Education committee member, and a member of Women Leading Technology Sorority. She champions STEM education through outreach projects and her podcast series. She holds two patents.