HAProxy

The Reliable, High Performance TCP/HTTP Load Balancer


Quick links

Quick News
Recent News
Description
Design Choices
Supported Platforms
Performance
Reliability
Security
Download
Documentation
Live demo
Commercial Support
Products using HAProxy
Contributions
Other Solutions
Contacts
Mailing list archives
10-Gbps load-balancing


Willy TARREAU


 
Web 1wt.eu

Quick News

July 20th, 2008 : two lines...

    Two lines... That's all what is needed with the new TCP content inspection system to stop half of the spams I got home. One of my major customers who uses HAProxy a lot has sponsored the development of some preliminary content inspection which is used to decide whether to forward a connection or not. The very first usage of this feature consists in checking that only SSL is spoken on a connection. But most likely more protocols will come soon. As a nice side effect, I could now add a delay before the HELO message of my SMTP server, and reject all robots which talk first (forbidden). And since many spam bots have small timeout values, many of them abort before the timeout is reached, resulting in my incoming spam rate dropping from about 300/hour to "only" 150/hour. Those who keep up with the time out slow down due to limited resources. The small addition simply consists in adding those two lines in the frontend :

    		tcp-request inspect-delay  35s
    		tcp-request content reject if REQ_CONTENT
    	  

June 21th, 2008

    haproxy versions 1.3.15.2 and 1.3.14.6 have been released to fix a major bug in request queue handling. The problem was that due to a design problem, it was possible for new requests to be immediately served by a server before a request in queue would be served. That caused some requests to remain in queue until they reached the queue timeout, after which either they would eventually be served, or return a 503 error code to the client.

    Since it was a design problem, it took a lot of time analyzing the root cause and finding a solution. However, as a positive side effect, the fix now makes the redispatch option work for requests which overflow a queue. That way, clients do not get a 503 error anymore but can be served by another server (which was the purpose of the redispatch option.

    Note that it is possible that 1.2 is also affected by the issue since some parts of the faulty code have not changed since. But it is very hard to determine if it is faulty or not, and backporting the fix would take even more time. Maybe I will eventually take a look at it if people complain about the issue.

    Update (2008/06/28): Alexander Staubo, who first noticed the problem, has run a new series of tests showing that the problem is definitely fixed. It also demonstrates the very nice positive effect of running with maxconn 1 with Rails servers.

May 25th, 2008

    Released haproxy versions 1.3.15.1 and 1.3.14.5 with minor fixes : build fix for GCC 4.3, fix for early truncate of stats output in certain circumstances, and better handling of large amounts of highly active sockets. I indeed discovered during testing that the sepoll poller was so much efficient that when running at gigabit speed with 80000 active sockets fighting for their CPU share, almost all of them were running in speculative mode, causing starvation of the remaining ones, which in turn caused the accept() call to be very rarely called. Delays of about 40 seconds have been observed on a 3.4 GHz Pentium 4 to get the stats page under such a load. The other pollers were not better BTW. The fix consisted in ensuring that polled events are checked at much often as the speculative ones. With this fix, the stats page responds in less than one second on such a saturated machine. There is still room for improvement relying on events prioritization though. Version 1.3.15 has been promoted as the recommended one since there has been no regression report. Version 1.2.18 was also released for users of 1.2 which experienced trouble building on BSD.

April 19th, 2008

    Released haproxy version 1.3.15 with many new features. The most important changes are stats updates (HTTP and UNIX), enhancements of server checks such as tracking and dynamic intervals, addition of the leastconn load-balancing algorithm, a fully transparent mode on Linux, better handling of connection failures (dead server avoidance and turn-around state), support for inter-site off-loading through redirects, updates to the build process, and large documentation updates. For more information, please check the ChangeLog. Due to the important number of changes, upgrade from earlier versions should be performed with a bit of care.

    Once again, a lot of code comes from contributions. I'd like to specially thank Krzysztof Oledzki for a lot of useful contributions, including the SNMP agent, and the guys at Nokia for the good work they have done on POST parameter hashing.

Recent news...

Latest versions

BranchDescriptionLast versionReleasedNotes
1.3.xDevelopment GIT not releasedmay be broken
1.3.151.3.15-maint 1.3.15.2 2008/06/21 Recommended version
1.3.141.3.14-maint 1.3.14.6 2008/06/21 Previous version
1.2.x1.2-stable 1.2.18 2008/05/25 Important fixes only
1.1.x1.1-stable 1.1.34 2006/01/29 Critical fixes only
1.0.x1.0-old1.0.22001/12/30Unmaintained

Description

HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for web sites crawling under very high loads while needing persistence or Layer7 processing. Supporting tens of thousands of connections is clearly realistic with todays hardware. Its mode of operation makes its integration into existing architectures very easy and riskless, while still offering the possibility not to expose fragile web servers to the Net, such as below :

Currently, two major versions are supported :

  • version 1.1 - maintains critical sites online since 2002
    The most stable and reliable, has reached years of uptime. Receives no new feature, dedicated to mission-critical usages only.
  • version 1.2 - opening the way to very high traffic sites
    The same as 1.1 with some new features such as poll/epoll support for very large number of sessions, IPv6 on the client side, application cookies, hot-reconfiguration, advanced dynamic load regulation, TCP keepalive, source hash, weighted load balancing, rbtree-based scheduler, and a nice Web status page. This code is still evolving but has significantly stabilized since 1.2.8.

Additionally, a third version 1.3 is under active development. New features include :

  • Content Switching : provides ability to select a group of server based on any part of the request such as the URI, the Host field, cookies, or anything else. There is a growing request for this feature from large sites which separate dynamic and static contents.
  • Full Transparent Proxy : it is possible connect to the server with the Client's IP address or even any other IP address. This is possible only on Linux 2.4/2.6 with the cttproxy patch. This feature also makes it possible to transparently handle part of the traffic for a particular server without changing any server's address.
  • New faster tree-based scheduler : versions up to 1.2.16 required that all timeouts were set to the same value to support tens of hundreds of connections at full speed. With this new scheduler, it is no longer the case. I have backported it to 1.2.17.
  • Kernel TCP splicing : avoiding kernel-to-user then user-to-kernel data copies improves bandwidth and lowers CPU usage. Haproxy 1.3 supports Linux L7SW in order to achieve multi-gigabit performance on commodity hardware.
  • Connection Tarpitting : since the cost of maintaining a connection open is low, it is sometimes desirable to "tarpit" attack bots, which means maintain their connections open to limit their capacity. This has been developped for a site crawling under a small DDoS with easily identifiable requests from a few thousand zombies.
  • Finer Header Processing : will make it easier to write header-based rules and to process parts of the URI.
  • Very Fast and reliable Header Parsing : full parsing and indexing of an average request typically takes less than 2 microseconds with fully RFC2616-compliant integrity checks.
  • Modular Design : allow more people to contribute to the project and make it easier to debug. The pollers have been split, already making their development a lot easier. Other subsystems will be modularized soon.
  • Speculative I/O processing : try to access data on a socket before being notified about its readiness. The poller just speculates about what should be available and what should not, tries to guess, and if it wins, several expensive syscalls are saved. If it loses, those syscalls will have to be called anyway. A net overall gain of about 10% has been observed using Linux epoll().
  • ACLs : use any combination of any criterion as a condition to any action.
  • More load balancing algorithms : right now, Weighted Round Robin, Weighted Source Hash and Weighted URL Hash are implemented. Weighted Least Conns is pending. Other algorithms may come later such as Weighted Measured Response Time.

Unlike other free "cheap" load-balancing solutions, this product is only used by a few hundreds of people around the world, but those people run very big sites serving several millions hits and between several tens of gigabytes to several terabytes per day to hundreds of thousands of clients. They need 24x7 availability and have internal skills to risk to maintain a free software solution. Often, the solution is deployed for internal uses and I only know about it when they send me some positive feedback or when they ask for a missing feature ;-)

Design Choices and history

HAProxy implements an event-driven, single-process model which enables support for very high number of simultaneous connections at very high speeds. Multi-process or multi-threaded models can rarely cope with thousands of connections because of memory limits, system scheduler limits, and lock contention everywhere. Event-driven models do not have these problems because implementing all the tasks in user-space allows a finer resource and time management. The down side is that those programs generally don't scale well on multi-processor systems. That's the reason why they must be optimized to get the most work done from every CPU cycle.

It began in 1996 when I wrote Webroute, a very simple HTTP proxy able to set up a modem access. But its multi-process model cloberred its performance for other usages than home access. Two years later, in 1998, I wrote the event-driven ZProx, used to compress TCP traffic to accelerate modem lines. It was when I first understood the difficulty of event-driven models. In 2000, while benchmarking a buggy application, I heavily modified ZProx to introduce a very dirty support for HTTP header rewriting. HAProxy's ancestor was born. First versions did not perform the load-balancing themselves, but it quickly proved necessary.

Now in 2007, the core engine is reliable and very robust, despite the frightening comments in the code. Event-driven programs are robust and fragile at the same time : their code needs very careful changes, but the resulting executable handles high loads and supports attacks without ever failing. It is the reason why HAProxy only supports a fine set of features.

People often ask for SSL and Keep-Alive support. Both features will complicate the code and render it fragile for several releases. By the way, both features have a negative impact on performance :

  • Having SSL in the load balancer itself means that it becomes the bottleneck. When the load balancer's CPU is saturated, the overall response times will increase and the only solution will be to multiply the load balancer with another load balancer in front of them. the only scalable solution is to have an SSL/Cache layer between the clients and the load balancer.
  • Keep-alive was invented to reduce CPU usage on servers when CPUs were 100 times slower. But what is not said is that persistent connections consume a lot of memory while not being usable by anybody except the client who openned them. Today in 2006, CPUs are very cheap and memory is still limited to a few gigabytes by the architecture or the price. If a site needs keep-alive, there is a real problem. Highly loaded sites often disable keep-alive to support the maximum number of simultaneous clients.

However, I'm planning on implementing both features in future versions, because it appears that there are users who mostly need availability above performance, and for them, it's understandable that having both features will not impact their performance, and will reduce the number of components.

Supported platforms

HAProxy is known to reliably run on the following OS/Platforms :

Highest performance should be achieved with haproxy versions newer than 1.2.5 running on Linux 2.6, or epoll-patched Linux kernel 2.4. It is only because of a very OS-specific optimization : the default polling system for version 1.1 is select(), which is common among most OSes, but can become slow when dealing with thousands of file-descriptors. Version 1.2 uses poll() by default instead of select(), but on some systems it may even be slower. However, it is recommended on Solaris as its implementation is rather good. On Linux 2.6, or patched Linux 2.4, HAProxy 1.2 will default to using epoll which achieves O(1) performance at any load, thus being even faster than poll() or select() on other systems. FreeBSD is known to slightly surpass this performance with its kqueue mechanism. Support for this poller has been implemented in HAProxy 1.3, but no benchmark has been done yet.

Based on those facts, people looking for a very fast load balancer should consider the following options on x86 or x86_64 hardware, in this order :

  1. HAProxy 1.2.13.1+ on Linux 2.4 + epoll
  2. HAProxy 1.2.13.1+ on Linux 2.6.16 + scheduler starvation fixes
  3. HAProxy 1.2.8+ on FreeBSD
  4. HAProxy 1.2.8+ on Solaris 10

Note: if you're looking for a very fast web load-balancing solution, the second fastest platform I ever tested was an HP-DL145 (Dual Opteron-248 at 2.2 GHz) running an epoll-patched Linux kernel 2.4 and haproxy-1.2 at 21000 hits/s and 2 Gbps. Very impressive indeed for such a cheap system ! The fastest one was a Sun V40Z (Quad Opteron-848 at 2.2 GHz) which showed 30000 hits/s on dual gigabit links. This is only 50% more than the dual-proc, so I guess we're really getting close to system limits. On the other side, certainly that a little system tuning can improve things greatly. Anyway, the Dual-Opteron still stays the best perf/price ratio.

Performance

Well, since a user's testimony is better than a long demonstration, please take a look at Chris Knight's experience with haproxy saturating a gigabit fiber on a video download site. Also, my experiments with Myricom's 10-Gig NICs might be of interest.

HAProxy involves several techniques commonly found in Operating Systems architectures to achieve the absolute maximal performance :

  • a single-process, event-driven model considerably reduces the cost of context switch and the memory usage. Processing several hundreds of tasks in a millisecond is possible, and the memory usage is in the order of a few kilobytes per session while memory consumed in Apache-like models is more in the order of megabytes per process.
  • O(1) event checker on systems that allow it (currently only Linux with HAProxy 1.2), allowing instantaneous detection of any event on any connection among tens of thousands.
  • Single-buffering without any data copy between reads and writes whenever possible. This saves a lot of CPU cycles and useful memory bandwidth. Often, the bottleneck will be the I/O busses between the CPU and the network interfaces.
  • MRU memory allocator using fixed size memory pools for immediate memory allocation favoring hot cache regions over cold cache ones. This dramatically reduces the time needed to create a new session.
  • work factoring, such as multiple accept() at once, and the ability to limit the number of accept() per iteration when running in multi-process mode, so that the load is evenly distributed among processes.
  • event aggregation : timing resolution is adapted to match the system scheduler's resolution. This allows many events to be processed at once without having to sleep when we're sure that we would have woken up immediately. This also leaves a large performance margin with virtually no degradation of response time when the CPU usage approaches 100%.
  • two FSMs (finite state machines) per session ensure exact knowledge of client-side and server-side connection state without having to perform a lot of controls at each call to deal with errors. Actions are decided based on the combination of both states and events. For an even improved state-based decision, future versions will use 3 FSMs with a hard-coded transition matrix like what is used in hardware processors and ASICs.
  • reduced footprint for frequently and randomly accessed memory areas such as the file descriptor table which uses 4 bitmaps. This reduces the number of CPU cache misses and memory prefetching time.
  • optimized HTTP header analysis : headers are parsed an interpreted on the fly, and the parsing is optimized to avoid an re-reading of any previously read memory area. Checkpointing is used when an end of buffer is reached with an incomplete header, so that the parsing does not start again from the beginning when more data is read.
  • careful reduction of the number of expensive system calls. Most of the work is done in user-space by default, such as time reading, buffer aggregation, file-descriptor enabling/disabling.

All these micro-optimizations result in very low CPU usage even on moderate loads. And even at very high loads, when CPU is saturated, it is quite common to note something like 5% user and 95% system, which means that the HAProxy process consumes about 20 times less than its system counterpart. This explains why the tuning of the Operating System is very important. I personnally build my own patched Linux 2.4 kernels, and finely tune a lot of network sysctls to get the most out of a reasonable machine.

This also explains why Layer 7 processing has little impact on performance : even if user-space work is doubled, the load distribution will look more like 10% user and 90% system, which means an effective loss of only about 5% of processing power. This is why on high-end systems, HAProxy's Layer 7 performance can easily surpass hardware load balancers' in which complex processing which cannot be performed by ASICs has to be performed by slow CPUs. Here is the result of a quick benchmark performed on haproxy 1.3.9 at EXOSEC on a single core Pentium 4 with PCI-Express interfaces:

    In short, a hit rate above 10000/s is sustained for objects smaller than 6 kB, and the Gigabit/s is sustained for objects larger than 40 kB.

In production, HAProxy has been installed several times as an emergency solution when very expensive, high-end hardware load balancers suddenly failed on Layer 7 processing. Hardware load balancers process requests at the packet level and have a great difficulty at supporting requests across multiple packets and high response times because they do no buffering at all. On the other side, software load balancers use TCP buffering and are insensible to long requests and high response times. A nice side effect of HTTP buffering is that it increases the server's connection acceptance by reducing the session duration, which leaves room for new requests. New benchmarks will be executed soon, and results will be published. Depending on the hardware, expected rates are in the order of a few tens of thousands of new connections/s with tens of thousands of simultaneous connections.

There are 3 important factors used to measure a load balancer's performance :

A load balancer's performance related to these factors is generally announced for the best case (eg: empty objects for session rate, large objects for data rate). This is not because of lack of honnesty from the vendors, but because it is not possible to tell exactly how it will behave in every combination. So when those 3 limits are known, the customer should be aware that he will generally be below all of them. A good rule of thumb on software load balancers is to consider an average practical performance of half of maximal session and data rates for average sized objects.

You might be interested in checking the 10-Gigabit/s page.

Reliability - keeping high-traffic sites online since 2002

Being obsessed with reliability, I tried to do my best to ensure a total continuity of service by design. It's more difficult to design something reliable from the ground up in the short term, but in the long term it reveals easier to maintain than broken code which tries to hide its own bugs behind respawning processes and tricks like this.

In single-process programs, you have no right to fail : the smallest bug will either crash your program, make it spin like mad or freeze. There has not been any such bug found in the code nor in production for the last 5 years.

HAProxy has been installed on Linux 2.4 systems serving millions of pages every day, and which have only known one reboot in 3 years for a complete OS upgrade. Obsiouvly, they were not directly exposed to the Internet because they did not receive any patch at all. The kernel was a heavily patched 2.4 with Robert Love's jiffies64 patches to support time wrap-around at 497 days (which happened twice). On such systems, the software cannot fail without being immediately noticed !

Right now, it's being used in several Fortune 500 companies around the world to reliably serve millions of pages per day or relay huge amounts of money. Some people even trust it so much that they use it as the default solution to solve simple problems (and I often tell them that they do it the dirty way). Such people still use version 1.1 which sees very limited evolutions and which targets mission-critical usages. It is really suited for such environments because the indicators it returns provide a lot of valuable information about the application's health, behaviour and defects, which are used to make it even more reliable. Version 1.2 has not yet received as much testing as 1.1, but should be considered for high volumes of traffic or to benefit from newest features.

As previously explained, most of the work is executed by the Operating System. For this reason, a large part of the reliability involves the OS itself. Recent versions of Linux 2.4 offer a high level of stability. However, it requires a bunch of patches to achieve a high level of performance. Linux 2.6 includes the features needed to achieve this level of performance, but is not yet stable enough for such usages. The kernel needs at least one upgrade every month to fix a bug or vulnerability. Some people prefer to run it on Solaris (or do not have the choice). Solaris 8 and 9 are known to be really stable right now, offering a level of performance comparable to Linux 2.4. Solaris 10 might show performances closer to Linux 2.6, but with the same code stability problem. I have too few reports from FreeBSD users, but it should be close to Linux 2.4 in terms of performance and reliability. OpenBSD sometimes shows socket allocation failures due to sockets staying in FIN_WAIT2 state when client suddenly disappears. Also, I've noticed that hot reconfiguration does not work under OpenBSD.

The reliability can significantly decrease when the system is pushed to its limits. This is why finely tuning the sysctls is important. There is no general rule, every system and every application will be specific. However, it is important to ensure that the system will never run out of memory and that it will never swap. A correctly tuned system must be able to run for years at full load without slowing down nor crashing.

Security - Not even one vulnerability in 5 years

Security is an important concern when deploying a software load balancer. It is possible to harden the OS, to limit the number of open ports and accessible services, but the load balancer itself stays exposed. For this reason, I have been very careful about programming style. The only vulnerability found so far dates 5 years and only lasted for one week. It was introduced when logs were reworked. It could be used to cause BUS ERRORS to crash the process, but it did not seem possible to execute code : the overflow concerned only 3 bytes, too short to store a pointer (and there was a variable next).

Anyway, much care is taken when writing code to manipulate headers. Impossible state combinations are checked and returned, and errors are processed from the creation to the death of a session. A few people around the world have reviewed the code and suggested cleanups for better clarity to ease auditing. By the way, I'm used to refuse patches that introduce suspect processing or in which not enough care is taken for abnormal conditions.

I generally suggest starting HAProxy as root because it can then jail itself in a chroot and drop all of its privileges before starting the instances. This is not possible if it is not started as root because only root can execute chroot().

Logs provide a lot of information to help to maintain a satisfying security level. They can only be sent over UDP because once chrooted, the /dev/log UNIX socket is unreachable, and it must not be possible to write to a file. The following information are particularly useful :

  • source IP and port of requestor make it possible to find their origin in firewall logs ;
  • session set up date generally matches firewall logs, while tear down date often matches proxies dates ;
  • proper request encoding ensures the requestor cannot hide non-printable characters, nor fool a terminal.
  • arbitrary request and response header and cookie capture help to detect scan attacks, proxies and infected hosts.
  • timers help to differentiate hand-typed requests from browsers's.

HAProxy also provides regex-based header control. Parts of the request, as well as request and response headers can be denied, allowed, removed, rewritten, or added. This is commonly used to block dangerous requests or encodings (eg: the Apache Chunk exploit), and to prevent accidental information leak from the server to the client. Other features such as Cache-control checking ensure that no sensible information gets accidentely cached by an upstream proxy consecutively to a bug in the application server for example.

Download

The source code is covered by GPL v2. Source code and pre-compiled binaries for Linux/x86 and Solaris/Sparc can be downloaded right here :

Documentation

There are three types of documentation now : the Reference Manual which explains how to configure HAProxy, the Architecture Guide which will guide you through various typical setups, and the new Configuration Manual which will soon replace the Reference Manual with more a explicit configuration language explanation.

Commercial Support (France)

If you think you don't have the time and skills to setup and maintain a free load balancer, or if you're seeking for commercial support to satisfy your customers or your boss, you should contact EXOSEC. Another solution would be to use Exceliance's ALOHA appliances (see below).

Products using HAProxy

The following products or projects use HAProxy :

  • redWall Firewall
    From the site : "redWall is a bootable CD-ROM Firewall. Its goal is to provide a feature rich firewall solution, with the main goal, to provide a webinterface for all the logfiles generated!"
  • Exceliance's ALOHA appliance
    Exceliance sells a complete solution embedding an optimized and hardened version of Formilux packaged for ease of use, reduced maintenance, and enhanced availability through the use of VRRP for box fail-over, bonding for link fail-over, etc...

Contributions

Some happy users have contributed code which may or may not be included. Others spent a long time analysing the code, and there are some who maintain ports up to date.

Other Solutions

If you don't need all of HAProxy's features and are looking for a simpler solution, you may find what you need here :

  • Linux Virtual Servers (LVS)
    Very fast layer 3/4 load balancing merged in Linux 2.4 and 2.6 kernels. Should be coupled with Keepalived to monitor servers. This generally is the solution embedded by default in most IP-based load balancers.
  • Pure Load Balancer (PLB)
    The author adopted the same event-driven model as in HAProxy (but relying on libevent). Interestingly, he has the same conclusions about other models's limitations. However, his goal is just to achieve high performance and availability, without any particular HTTP processing nor persistence.
  • Pound
    Pound can be seen as a complement to HAProxy. It supports SSL, and can direct traffic according to the requested URL. Its code is very small and will stay small for easy auditing. Its configuration file is very small too. However, it does not support persistence, and the performance associated to its multi-threaded model limits its usage to medium sites only.
  • Pen
    Pen is a very simple load balancer for TCP protocols. It supports source IP-based persistence for up to 2048 clients. Supports IP-based ACLs. Uses select() and supports higher loads than Pound but will not scale very well to thousands of simultaneous connections.

Contacts

Feel free to contact me at for any questions or comments :

An IRC channel for haproxy has been opened on FreeNode (but don't seek me there, I'm not) :