Load Balancing: Distributing Traffic Across Servers
A load balancer distributes incoming network traffic across multiple servers, ensuring no single server bears disproportionate load. It is a fundamental building block of scalable, resilient web architectures — enabling horizontal scaling and providing high availability through redundancy.
How Load Balancers Work
Clients connect to the load balancer's IP address or DNS name — they never connect directly to application servers. The load balancer forwards each request to one of the available backend servers based on a routing algorithm, and returns the response to the client.
Load Balancing Algorithms
- Round-robin: Requests distributed to each server in turn — simplest, works well when servers are identical
- Least connections: Routes to the server with fewest active connections — better for requests with variable processing time
- IP hash: Routes requests from the same client IP to the same server — useful for session stickiness (though we prefer stateless designs)
- Weighted: Routes more traffic to more capable servers — useful during rolling deployments or when servers have different specifications
Health Checks
Load balancers continuously probe backend servers with health checks — if a server fails to respond or returns an error, the load balancer removes it from the rotation and stops sending it traffic. This enables automatic failover when application instances crash or become unhealthy.
Layer 4 vs Layer 7 Load Balancers
- Layer 4 (Transport): Routes based on IP and port — fast, but cannot inspect request content
- Layer 7 (Application): Routes based on request content — URL path, headers, cookies. Enables path-based routing (route /api/* to API servers, /* to web servers), host-based routing, and SSL termination