Leader Election
Implement Raft leader election to form a static 5-node cluster that elects a single leader.
Cluster Formation
Section titled “Cluster Formation”The test harness starts each node as a Docker container on a shared network and sets two environment variables:
ADDR=10.0.42.101:8080PEERS=10.0.42.102:8080,10.0.42.103:8080,10.0.42.104:8080,10.0.42.105:8080ADDR is this node’s own address. PEERS is a comma-separated list of the other nodes’ addresses. Each node listens on port 8080. The cluster is static; membership doesn’t change in this stage.
Leader Election
Section titled “Leader Election”All nodes start as followers. If a follower doesn’t receive heartbeats within the election timeout (randomized between 500-1,000ms), it becomes a candidate and starts an election.
These values are more generous than the Raft paper’s suggested 150-300ms election timeout. Each node runs in a Docker container on a shared host, which adds container network latency and scheduling jitter on top of any GC or runtime pauses in your implementation. Higher values give varying implementations room to handle these issues without spurious elections.
Candidates request votes from other nodes. A candidate becomes leader if it receives votes from a majority ( where is the cluster size; 3 votes for a 5-node cluster). Each node grants at most one vote per term.
Terms act as a logical clock. When a node discovers a higher term, it immediately updates its term and reverts to follower.
Heartbeats
Section titled “Heartbeats”Leaders send AppendEntries RPC heartbeats every 100ms to maintain authority. The entries array is empty in this stage.
If a follower doesn’t receive heartbeats within the election timeout, it starts a new election.
Client Requests
Section titled “Client Requests”Leaders accept GET, PUT, and DELETE requests from the HTTP API stage.
Followers redirect all requests to the leader with 307 Temporary Redirect and a Location header:
HTTP/1.1 307 Temporary RedirectLocation: http://10.0.42.101:8080/kv/mykeyIf no leader is known, return 503 Service Unavailable.
Persistence
Section titled “Persistence”Persist currentTerm and votedFor to disk before responding to any RPC that changes them. Use fsync to ensure durability.
State must survive crashes and restarts. A node that crashes and restarts should resume with its persisted state.
POST /raft/request-vote
Section titled “POST /raft/request-vote”RequestVote RPC that is invoked by candidates to gather votes during leader election.
POST /raft/request-vote
{ "term": 3, "candidate-id": "10.0.42.102:8080", "last-log-index": 0, "last-log-term": 0}
----
200{ "term": 3, "vote-granted": true}term: candidate’s current termcandidate-id: the candidate’s own address (fromADDR)last-log-index/last-log-term: set to0in this stage; no log entries yetvote-granted: whether the vote was granted- response
term: the responder’s current term, so a stale candidate can update itself
POST /raft/append-entries
Section titled “POST /raft/append-entries”AppendEntries RPC that is used for heartbeats (empty entries) to maintain leader authority.
POST /raft/append-entries
{ "term": 3, "leader-id": "10.0.42.101:8080", "prev-log-index": 0, "prev-log-term": 0, "entries": [], "leader-commit": 0}
----
200{ "term": 3, "success": true}leader-id: the leader’s own address (fromADDR), so followers can redirect clientsentries: empty in this stage; will carry log entries in later stagesprev-log-index/prev-log-term: set to0in this stage; no log entries yetsuccess:trueif the follower’s log matchedprev-log-indexandprev-log-term- response
term: the responder’s current term, so a stale leader can step down
GET /cluster/info
Section titled “GET /cluster/info”Returns the node’s current cluster state.
GET /cluster/info
----
200{ "id": "10.0.42.101:8080", "role": "leader", "term": 3, "leader": "10.0.42.101:8080", "peers": ["10.0.42.102:8080", "10.0.42.103:8080", "10.0.42.104:8080", "10.0.42.105:8080"]}id: this node’s own address (fromADDR)role: this node’s current role, one ofleader,candidate, orfollowerterm: current term; starts at0before any electionleader: the known leader’s address, ornullbefore an election and after leader failurepeers: all cluster members except this node, sorted lexicographically
Testing
Section titled “Testing”Your server will be started as a 5-node cluster, with its own address in ADDR and the other four nodes’ addresses in PEERS. The tests will verify leader election behavior:
$ clstr test leader-electionTesting leader-election: Cluster Elects and Maintains Leader
✓ /cluster/info Returns Pre-Election State✓ Leader Election Completes✓ Exactly One Leader Per Term✓ Leader Maintains Authority via Heartbeats✓ Followers Redirect to Leader✓ Leader Handles KV Operations✓ New Leader Elected After Leader Crash✓ Service Unavailable During Election✓ Partition Enforces Quorum✓ Cluster Converges After Partition Heals
PASSED ✓
Run 'clstr next' to advance to the next stage.Example failure:
$ clstr test leader-electionTesting leader-election: Cluster Elects and Maintains Leader
✓ /cluster/info Returns Pre-Election State✓ Leader Election Completes✓ Exactly One Leader Per Term✓ Leader Maintains Authority via Heartbeats✓ Followers Redirect to Leader✓ Leader Handles KV Operations✓ New Leader Elected After Leader Crash✓ Service Unavailable During Election✗ Partition Enforces Quorum
GET - 1 of 2 nodes passed (expected all 2) http://127.0.0.1:8003/cluster/info (n1) → 200 Expected field "role": one of [follower, candidate] { "id": "10.0.42.101:8080", "leader": "10.0.42.101:8080", "peers": [ "10.0.42.102:8080", "10.0.42.103:8080", "10.0.42.104:8080", "10.0.42.105:8080" ], "role": "leader", "term": 5 }
The minority partition [n1, n2] must not elect a leader. A candidate needs votes from at least 3 nodes; with only n1 and n2 reachable, no election can succeed.
FAILED ✗
Read the guide: https://clstr.io/kv-store/leader-electionNetwork Partitions
Section titled “Network Partitions”Partition tests use iptables DROP rules applied directly inside each container to cut traffic between groups of nodes. Rules are bidirectional, so a partitioned node can neither send to nor receive from nodes in the other group. When the partition heals, all rules are flushed on every node to restore full connectivity.
Your implementation doesn’t need to do anything special to support this. Partitions are transparent at the network level: RPCs to unreachable nodes will time out or be refused.
Debugging
Section titled “Debugging”Each node’s output (stdout/stderr) is captured during testing and viewable with clstr logs. Logs persist across restarts within a run, so you see the full history even after a crash or partition:
$ clstr logs n1
================ STARTED ================
Node started addr=10.0.42.101:8080Voted for 10.0.42.104:8080 in term 1Received heartbeat from 10.0.42.104:8080, following in term 1
================ KILLED ================
================ STARTED ================
Node started addr=10.0.42.101:8080Received heartbeat from 10.0.42.104:8080, following in term 1
================ PARTITIONED FROM: n3, n4, n5 ================
Heartbeat timeout, starting election for term 2Received vote from 10.0.42.102:8080 (2/5)Heartbeat timeout, starting election for term 3Received vote from 10.0.42.102:8080 (2/5)
================ PARTITION HEALED ================
Election timeout, starting election for term 4Received vote from 10.0.42.102:8080 (2/5)Received vote from 10.0.42.103:8080 (3/5)Became leader in term 4$ clstr logs n4
================ STARTED ================
Node started addr=10.0.42.104:8080Election timeout, starting election for term 1Received vote from 10.0.42.101:8080 (2/5)Received vote from 10.0.42.102:8080 (3/5)Became leader in term 1
================ PARTITIONED FROM: n1, n2 ================
Heartbeat to 10.0.42.101:8080 failedHeartbeat to 10.0.42.102:8080 failedMaintaining leadership in term 1, majority reachable
================ PARTITION HEALED ================
Received RequestVote from 10.0.42.101:8080, stepping down in term 4Voted for 10.0.42.101:8080 in term 4Received heartbeat from 10.0.42.101:8080, following in term 4Add your own logging to help trace election and heartbeat activity across nodes.
Resources
Section titled “Resources”- Raft Visualization by The Secret Lives of Data
- The Raft Consensus Algorithm
- Distributed Systems 6.2: Raft by Martin Kleppmann
- Students’ Guide to Raft by Jon Gjengset
- Database Internals Chapter 10: Leader Election by Alex Petrov
- Database Internals Chapter 14: Consensus by Alex Petrov