Application Delivery Blog

Load Balancer Troubleshooting like a BADaaS. Get More Sleep with Avi's Load Balancer

Lei Yang
Posted on Sep 20, 2018 1:56:50 PM

Any vendor will tell you they have detailed analytics and an intuitive interface (GUI). They'll state that certain tasks can be done in “just a few clicks”. Most vendors over-promise their load balancer troubleshooting process and under-deliver in this area, which is why we at Avi Networks took up the challenge of showing (not just telling) how much better the troubleshooting experience can be with our load balancer.

Scenario 1: Load Balancer Troubleshooting without TCPdump or 2AM Calls

The application is slowing down and someone created a ticket — or made an urgent 2AM wakeup call — to the networking team. You dig through logs, if they even exist, or search through a TCPdump for hours to find the problem went away on its own. You close the ticket, but the same issue occurs again a week later... that's a very real scenario with appliance-based load balancers. Here is the process with Avi Networks.



Two lessons learned here:

  1. Visibility: We are not super heroes. We need intuitive tools to help us quickly understand and visualize what might be causing the problem. In this case, why is the server going down? Digging through pages of plain text log dumps is not helpful, especially when virtual services are involved. In the video, you see that you can easily trace which servers are down via heath score, and which error codes are triggered via health monitor checks. Voila! You find where the problem lies and what caused it in literally a few clicks.
  2. Real-time: Networking issue can be tricky as it can have many factors. Usually, a configuration error is easier to spot than an error dependent on traffic loads, corner cases, or user patterns. What makes it worse is this type of error may not be reproducible — making troubleshooting more difficult. You need a tool that can quickly pinpoint what’s going on when the issue is occurring. In this case, you see the problem right in front of your eyes and can fix it immediately. This is very powerful capability that traditional “after-the-fact” debugging methods cannot match.


Scenario 2: Triage without Finger-Pointing Using End-to-End Timing

Sometimes you can’t tell which team is responsible for the issue and how to troubleshoot load balancers efficiently becomes more challenging. Triage becomes a big bottleneck to actual problem solving where multiple teams engage in an unproductive activity called “finger-pointing”. And we call this mean time to innocence (MTTI) syndrome. No bad intention by either the networking or application teams, just lack of proper tools. In most cases, different teams use different tools surfacing information at different levels, hence leading to different conclusions or partially informed decisions.



In this video, you see that as long as you can tell colors apart (no offense here), you can easily see where latency comes from at any particular point in time both real-time and historically. And that should give both teams some direction as to where to look for issues. 

Three reasons why Avi is able to help are:

  1. Prime Location: Location! Location! Location! The real estate slogan applies here perfectly. Load balancers are in the pathway between end users and applications. Its ability to collect network, application, and user information gives it a unique advantage to help troubleshoot issues that impact multiple components across the entire L2-L7 stack.
  2. Inline Analytics: Even if load balancers are deployed at such a prime location, if they just act like traffic cops and drop all the rich analytics to the ground (which traditional load balancers unfortunately do), you can’t derive insights. Avi’s distributed load balancers collect billions of metrics per hour! Every transaction logs and the data is fed to the Avi Controller (the brain) for analysis.
  3. Visualization: It’s not just about GUI. Without the location and inline analytics, visualization would be little more than a pretty dashboard. That is why Avi presents data in layers to serve different team’s needs: from high-level dashboard with end-to-end timing to drill-down into detailed searchable logs. It also surfaces insights instead of raw data. That’s why you have the flexibility to get commonly used Log Analytics or create inquiries yourself.

Avi also makes it easy to share the data between teams. Ultimately, the goal is to help the teams work together, save everyone’s time, and provide the best application experience.

Scenario 3: Non-Disruptive Upgrade to Your Apps (and Your Sleep)

“Honey, I am not available for any babysitting or housework at the following scheduled maintenance windows. There is a high likelihood of running into issues and potential downtime. But it’s all for the good as we need to upgrade to the latest and greatest. So please forgive me.” Are you tired of having this conversation and hope things will just go smoothly?

We hear you. That's why at Avi we try to make the process as straightforward as possible with non-disruptive safeguards built in such as automatic rollback and redundancy.

Follow a few simple steps:

  1. Upload the upgrade file
  2. File is validated by the system
  3. Push to the controller which takes care of the process



Beyond better load balancer troubleshooting you can automate tasks as the Avi Vantage Platform is 100% RESTful API-based. You can learn more about Avi’s automation in this Jetsons vs. Flintstones blog.

Topics: Analytics, Load Balancing, Troubleshooting, load balancer troubleshooting, how to troubleshoot load balancers

New Call-to-action

Subscribe to Email Updates

Recent Posts

Posts by Topic

see all