Leader Election

Reliability

Also known as: consensus algorithm

Leader election is a distributed algorithm by which nodes in a cluster designate one node as the leader responsible for coordinating actions, ensuring that exactly one node makes decisions at any given time.

In distributed systems, certain operations require coordination — deciding which replica accepts writes, which node runs a scheduled job, or which server manages a shared resource. Leader election solves this by establishing a single authority.

Common leader election approaches include Raft consensus (used by etcd), Paxos (theoretical foundation, used in Google Spanner), ZooKeeper-based election (using ephemeral znodes), and lease-based systems (a node holds a time-limited lock from a coordination service).

The leader typically maintains its position through heartbeats — regular signals to followers. If followers stop receiving heartbeats (the leader crashes or becomes unreachable), they trigger a new election.

Key challenges include split-brain scenarios (two nodes both believe they are the leader), election storms (rapid re-elections under instability), and the latency cost of consensus rounds. In system design, leader election is essential for systems requiring strong consistency or single-writer guarantees.

Related Terms

Ready to design?

Practice using leader election in a real system design on Supaboard's interactive whiteboard.

Browse Challenges