InfrastructureIntermediate~60 min

Design a Load Balancer

Design a layer 7 load balancer that distributes HTTP traffic across backend servers. Support multiple balancing algorithms, health checks, session persistence, and graceful server drain.

You'll practice

networkinginfrastructure

Functional Requirements

  • Distribute incoming traffic across backend servers
  • Detect and route around unhealthy servers
  • Support session persistence when needed

Non-Functional Requirements

  • Add less than 1ms latency overhead
  • Handle 100K+ concurrent connections
  • 99.99% uptime with automatic failover

Frequently Asked Questions

What is the difference between L4 and L7 load balancing?

L4 (transport layer) load balancers route based on IP and TCP/UDP port information without inspecting packet contents — fast but limited. L7 (application layer) load balancers can inspect HTTP headers, URLs, and cookies to make smarter routing decisions like path-based routing, A/B testing, or sticky sessions. L7 adds more latency but enables richer traffic management.

How do you implement health checks for backend servers?

Use both active and passive health checks. Active checks periodically send probe requests (HTTP GET to /health) and mark servers unhealthy after N consecutive failures. Passive checks monitor real traffic for error rates. Combine both: active checks detect unresponsive servers, passive checks catch degraded performance that active checks might miss.

What load balancing algorithms should I consider?

Round-robin is simplest but ignores server load. Least connections routes to the server with fewest active connections — good for varied request durations. Weighted variants let you account for different server capacities. Consistent hashing is useful when you need session affinity. For most web applications, least connections or weighted round-robin works well.

How do you handle zero-downtime deployments with a load balancer?

Use connection draining (graceful drain): stop sending new requests to the server being updated, but let existing connections finish within a timeout. The load balancer health check automatically removes the server from rotation when it stops responding. Rolling deployments update servers one at a time, maintaining capacity throughout.

Ready to design this system?