In an earlier blog, “Avi Vantage: A Cloud-Scale Distributed Software Load Balancer For Everyone”, we had described the high-level architecture of the Avi Vantage platform. The platform consists of a clustered centralized Controller, a scale-out distributed Layer-7 Reverse Proxy data path called as Service Engine (SE), a Visibility/Analytics engine, a RESTful interface to the Controller that enables integration with external orchestration engines, and a fast and responsive HTML5 UI. It is designed ground up to be a modern cloud-scale Application Delivery Controller (ADC) that enables application deployment across any cloud - private, public, or hybrid.
Topics: cloud, load balancer, ADC, application delivery, Application Delivery Controller, Load Balancing, Application Services, networking, HTTP, Routing, Lua, Scripts, Application Routing, Content-Based Routing, Policies, Content Routing
The Hardware Load Balancer Brick Wall
Last month at Networked Systems Design and Implementation (NSDI) conference, Google lifted the covers off Maglev, their distributed network software load balancer (LB) . Since 2008, Maglev has been handling traffic for core Google services like Search and Gmail. Not surprisingly, it's also the load balancer that powers Google Compute Engine and enables it to serve a million requests per sec without any cache pre-warming . Impressive? Absolutely! If you have been following application delivery in the era of cloud, say over last 6 years, you would have noticed another significant announcement at Sigcomm ‘13 by the Microsoft Azure networking team. Azure runs critical services such as blob, table, and relational storage on Ananta , its home-grown cloud scale software load balancer on commodity x86, instead of running it on more traditional hardware load balancers. Both Google and Microsoft ran headlong into what can be best described as “the hardware LB brick wall”, albeit at different times and along different paths in their cloud evolution. For Google, it started circa 2008 when the traffic and flexibility needs for their exponentially growing services and applications went beyond the capability of hardware LBs. For Azure, it was circa 2011, when the exponential growth of their public cloud led to the realization that hardware LBs do not scale and forced them to build their own software variant.
So, what is this “hardware LB brick wall” that these web-scale companies ran into?
Whether it is a water-cooler conversation about the latest wearable health monitor or the current cautions from the CDC about the Zika virus, health may easily rank as one of the most talked about topics in our daily lives. As a technologist, I am part of a number of conversations about a different kind of health - application health - which is as top-of-the-mind concern for enterprise application developers and administrators. The discussion of human health always evokes passionate debates - it turns out that this was no different with application health.
This was the case at Avi Networks when we asked a simple question - how do admins know that applications are in "good" health? I don't believe we had more meetings and debates about any other topic as much as we had about application health. In this blog post, I will take you through some of those passionate yet fascinating discussions that led to the creation of the Avi Health Score - a key capability of the Avi Vantage Platform.
The team had people with diverse backgrounds so we asked everyone the same question - "What does application health mean to you?". Here is a sample of the responses we received:
"Health is how much throughput my application can deliver. If it is doing 10Gbps that means it is good"
"Health is bad when CPU and memory are above 100%."
"Health is good when latency is below 100ms."
"Health is good if the application is up and responding to the health checks."
In the real world, if I ask you, "Do you believe I am in good health if I ran 3 miles today?", depending upon who you are you will likely respond with "it depends"; "of course!"; "did you run just today or do you run every day?"; or "what was your heart rate and vitals after the run?" You will have a whole lot of follow-up questions to dig into the details. To put this in perspective, tennis champ Roger Federer would likely win in straight sets against most people even if he were running a fever. Would that make him healthy? Of course not!
As you can see just a simple data point of a 3-mile run is not enough for a doctor to give a certificate of good health. Similarly, if you think you can determine a server's health based on the simple fact that it can handle a throughput of 10Gbps, you know you are probably wrong. It was hard for me to come to terms with this especially given the fact that I had spent most of my career prior to Avi Networks in a hardware company where it was normal to consider that networking hardware is healthy when a link is up and pumping at a bandwidth of 10Gbps.