High availability with HAProxy (Load Balancing, Keepalive)

HaProxy with load balancing and keepalive options enabled could be used to provide high availability of your RabbitMQ cluster.

The need of a load balancing proxy in front of the RabbitMQ cluster is presented due to the fact that sometimes the serving member of the cluster either looses connection or fails to serve and the message is lost.

A load balancing solution with “awarness” of the state of the cluster members is required. Since RabbitMQ does not use indexes and the immediate messaging answer is required, HaProxy provides an efficient and yet simple to maintain solution.

To set up HAProxy for your RabbitMQ cluster, follow these steps:

  1. Install HAProxy:
    Use the package manager of your choice to install HAProxy on the server where you want to run it.
    In our case is installed with apt.

  2. Configure HAProxy:
    Edit the haproxy.cfg configuration file and define the appropriate backend servers for your RabbitMQ instances.

  3. Haproxy.cfg is located under

    sudo -i cd /etc/haproxy/ nano haproxy.cfg

    Change the configuration according to the following

    global log 127.0.0.1 local0 notice maxconn 10000 user haproxy group haproxy defaults timeout connect 5s timeout client 100s timeout server 100s listen rabbitmq bind :5672 mode tcp balance roundrobin option http-keep-alive server rabbitmq-01 10.0.2.4:5672 check inter 5s rise 2 fall 3 server rabbitmq-02 10.0.2.5:5672 check inter 5s rise 2 fall 3 server rabbitmq-03 10.0.2.6:5672 check inter 5s rise 2 fall 3 # optional, for proxying management site frontend front_rabbitmq_management bind :15672 default_backend back_rabbitmq_management backend back_rabbitmq_management balance source option http-keep-alive server rabbitmq-mgmt-01 10.0.2.4:15672 check server rabbitmq-mgmt-02 10.0.2.5:15672 check server rabbitmq-mgmt-03 10.0.2.6:15672 check # optional, for monitoring # listen stats :9000 # mode http # stats enable # stats hide-version # stats realm Haproxy\ Statistics # stats uri / # stats auth haproxy:haproxy

    Change the IP and the ports to the corresponding ports of the cluster.

  4. Save the configuration.

  5. Start and Enable HAProxy Service:
    Start the HAProxy service and ensure that it starts automatically on system boot.

By following these steps, you can effectively install and configure HAProxy to manage traffic within your RabbitMQ cluster.

Μentioned that a RabbitMQ Queue is a singular structure. It exists only on the node that it was created, regardless of HA policy. A Queue is always its own master, and consists of 0…N slaves. Based on the above example, “NewQueue” on Node #2 is the Master-Queue, because this is the node on which the Queue was created. It contains 2 Slave-Queues – it’s counterparts on nodes #1 and #3. Let’s assume that Node #2 dies, for whatever reason; let’s say that the entire server is down. Here’s what will happen to “NewQueue”.

  1. Node #2 does not return a heartbeat, and is considered de-clustered

  2. The “NewQueue” master Queue is no longer available (it died with Node #2)

  3. RabbitMQ promotes the “NewQueue” slave instance on either Node #1 or #3 to master

This is standard HA behaviour in RabbitMQ. Let’s look at the default scenario now, where all 3 nodes are alive and well, and the “NewQueue” instance on Node #2 is still master.

  1. We connect to RabbitMQ, targeting “NewQueue”

  2. Our Load Balancer determines an appropriate Node, based on round robin

  3. We are directed to an appropriate node (let’s say, Node #3)

  4. RabbitMQ determines that the “NewQueue” master node is on Node #2

  5. RabbitMQ redirects us to Node #2

  6. We are successfully connected to the master instance of “NewQueue”

Despite the fact that our Queues are replicated across each HA node, there is only one available instance of each Queue, and it resides on the node on which it was created, or in the case of failure, the instance that is promoted to master. RabbitMQ is conveniently routing us to that node in this case:

 

Unfortunately for us, this means that we suffer an extra, unnecessary network hop in order to reach our intended Queue. This may not seem a major issue, but consider that in the above example, with 3 nodes and an evenly-balanced Load Balancer, we are going to incur that extra network hop on approximately 66% of requests. Only one in every three requests (assuming that in any grouping of three unique requests we are directed to a different node) will result in our request being directed to the correct node.