NGINX and the "Power of Two Choices" Load-Balancing Algorithm (2024)

New use cases sometimes require new load‑balancing algorithms, and in NGINX Plus R16 and NGINX Open Source1.15.1 we added a new method that is particularly suitable for distributed load balancers: an implementation of the “power of two choices” algorithm.

Why Do We Need a New Load‑Balancing Algorithm?

Classic load‑balancing methods such as LeastConnections work very well when you operate a single active load balancer which maintains a complete view of the state of the load‑balanced nodes. The “power of two choices” approach is not as effective on a single load balancer, but it deftly avoids the bad‑case “herd behavior” that can occur when you scale out to a number of independent load balancers.

This scenario is not just observed when you scale out in high‑performance environments; it’s also observed in containerized environments where multiple proxies each load balance traffic to the same set of service instances.

NGINX and the "Power of Two Choices" Load-Balancing Algorithm (1)

A common instance of this scenario occurs when you use NGINX Ingress Controller for Kubernetes, with one load‑balancing instance per Kubernetes node.

The algorithm is referred to in the literature as “power of two choices”, because it was first described in MichaelMitzenmacher’s1996 dissertation, The Power of Two Choices in Randomized Load Balancing. In NGINX and NGINXPlus, it’s implemented as a variation of the Random load‑balancing algorithm, so we also refer to it as Random with Two Choices.

How Does “Power of Two Choices” Work?

Let’s begin with what might be a familiar situation. You’ve just landed after a long international flight, and along with 400 other travelers, have walked into a busy arrivals hall.

Round-Robin Load Balancing

Round robin is a naive approach to load balancing. In this approach, the guide selects each queue in rotation– the first traveler is directed to queueA, next traveler to queueB, and so on. Once a traveler is directed to the last queue, the process repeats from queueA. Round Robin is the default load‑balancing algorithm used byNGINX:

This approach works adequately, until there’s a delay in one of the queues. Perhaps one traveler has misplaced his or her documentation, or arouses suspicion in the immigration officer:

The queue stops moving, yet the guide continues to assign travelers to that queue. The backlog gets longer and longer– that’s not going to make the impatient travelers any happier!

Least Connections Load Balancing

There’s a much better approach. The guide watches each queue, and each time a traveler arrives, he sends that traveler to the shortest queue. This method is analogous to the LeastConnections load‑balancing method in NGINX, which assigns each new request to the server with the fewest outstanding (queued) requests:

LeastConnections load balancing deals quite effectively with travelers who take different amounts of time to process. It seeks to balance the lengths of the queues, and avoids adding more requests to a queue that has stalled.

Least Time Load Balancing

We’ve seen that different passengers take different times to process; in addition, some queues are processed faster or slower than others. For example, one immigration officer might have computer problems which means he processes travelers more slowly; another officer might be a stickler for detail, quizzing travelers very closely. Other officers might be very experienced and able to process travelers more quickly.

What if each immigration booth has a counter above it, indicating how many travelers have been processed in, for example, the last10 minutes? Then the guide can direct travelers to a queue based on its length and how quickly it is being processed. That’s a more effective way to distribute load, and it’s what the LeastTime load‑balancing algorithm in NGINXPlus does:

This algorithm is specific to NGINXPlus because it relies on additional data collected with NGINXPlus’s Extended Status metrics. It’s particularly effective in cloud or virtual environments where the latency to each server can vary unpredictably, so queue length alone is not sufficient to estimate the delay.

It All Falls Apart with Multiple Guides

So far, we’ve had one guide (that is, load balancer) with a complete view of the queues and response time in the arrivals hall. That guide tries to make the best choice for each traveler based on the information he knows.

Now consider what happens if we have several guides, each directing travelers independently. The guides have independent views of the queue lengths and queue wait times– they only consider the travelers that they send to each queue.

This scenario is prone to an undesirable behavior, where all the guides notice that one queue is momentarily shorter and faster, and all send travelers to that queue. Simulations show that this “herd behavior” distributes travelers in a way that is unbalanced and unfair. In the same way, several independent load balancers can overload some upstream servers, no matter which “best choice” algorithm you use.

The “Power of Two Choices” Load-Balancing Algorithm

The solution lies in the “power of two choices” load‑balancing algorithm. Instead of making the absolute best choice using incomplete data, with “power of two choices” you pick two queues at random and chose the better option of the two, avoiding the worse choice.

“Power of two choices” is efficient to implement. You don’t have to compare all queues to choose the best option each time; instead, you only need to compare two. And, perhaps unintuitively, it works better at scale than the best‑choice algorithms. It avoids the undesired herd behavior by the simple approach of avoiding the worst queue and distributing traffic with a degree of randomness.

Using “Power of Two Choices” with NGINX and NGINXPlus

In NGINX and NGINXPlus, the “power of two choices” load‑balancing method is implemented as a variant of the Random algorithm, so we call it Random with TwoChoices.

In NGINX Open Source, Random with TwoChoices chooses between two randomly selected servers based on which currently has fewer active connections. This is the same selection criterion as used for the LeastConnections algorithm. (This is also the default algorithm in NGINXPlus, and can be explicitly configured by adding the least_conn parameter.)

upstream service1 { zone service1 64k; server 192.168.1.11; server 192.168.1.12; server 192.168.1.13; random two; }

NGINXPlus also supports the least_time parameter, which uses the same selection criterion as the LeastTime algorithm. As with that algorithm, you further choose between considering either:

The time to receive the response header (least_time=header)
The time to receive the full response (least_time=last_byte), as in the following snippet. The LeastTime criterion is ideal for situations where the latency to each upstream server can vary.

upstream service1 { zone service1 64k; server 192.168.1.11; server 192.168.1.12; server 192.168.1.13; random two least_time=last_byte; # use header or last_byte}

Conclusion

NGINX and NGINXPlus support a range of load‑balancing methods; in this article, we did not consider the deterministic Hash and IP Hash methods.

The LeastConnections (and for NGINXPlus, LeastTime) methods are very effective at balancing loads when the load balancer has a complete view of the workload assigned to each node and its past performance. They are less effective when several load balancers are assigning requests, and each has an incomplete view of the workload and performance.

“Power of two choices” uses a biased random algorithm, and has been demonstrated to be effective at balancing loads when each load balancer has an incomplete or delayed view. It avoids the “herd behavior” exhibited by other algorithms that seek to make a best decision on every request.

Consider Random with TwoChoices, NGINX’s implementation of “power of two choices”, for very high‑performance environments and for distributed load‑balancing scenarios. A good use case arises when using multiple Ingress controllers on Kubernetes.