Skip to content

Leader Election

Implement Raft leader election to form a static 5-node cluster that elects a single leader.

The test harness starts each node as a Docker container on a shared network and sets two environment variables:

ADDR=10.0.42.101:8080
PEERS=10.0.42.102:8080,10.0.42.103:8080,10.0.42.104:8080,10.0.42.105:8080

ADDR is this node’s own address. PEERS is a comma-separated list of the other nodes’ addresses. Each node listens on port 8080. The cluster is static; membership doesn’t change in this stage.

All nodes start as followers. If a follower doesn’t receive heartbeats within the election timeout (randomized between 500-1,000ms), it becomes a candidate and starts an election.

These values are more generous than the Raft paper’s suggested 150-300ms election timeout. Each node runs in a Docker container on a shared host, which adds container network latency and scheduling jitter on top of any GC or runtime pauses in your implementation. Higher values give varying implementations room to handle these issues without spurious elections.

Candidates request votes from other nodes. A candidate becomes leader if it receives votes from a majority (n+12\lceil\frac{n+1}{2}\rceil where nn is the cluster size; 3 votes for a 5-node cluster). Each node grants at most one vote per term.

Terms act as a logical clock. When a node discovers a higher term, it immediately updates its term and reverts to follower.

Leaders send AppendEntries RPC heartbeats every 100ms to maintain authority. The entries array is empty in this stage.

If a follower doesn’t receive heartbeats within the election timeout, it starts a new election.

Leaders accept GET, PUT, and DELETE requests from the HTTP API stage.

Followers redirect all requests to the leader with 307 Temporary Redirect and a Location header:

HTTP/1.1 307 Temporary Redirect
Location: http://10.0.42.101:8080/kv/mykey

If no leader is known, return 503 Service Unavailable.

Persist currentTerm and votedFor to disk before responding to any RPC that changes them. Use fsync to ensure durability.

State must survive crashes and restarts. A node that crashes and restarts should resume with its persisted state.

RequestVote RPC that is invoked by candidates to gather votes during leader election.

POST /raft/request-vote
{
"term": 3,
"candidate-id": "10.0.42.102:8080",
"last-log-index": 0,
"last-log-term": 0
}
----
200
{
"term": 3,
"vote-granted": true
}
  • term: candidate’s current term
  • candidate-id: the candidate’s own address (from ADDR)
  • last-log-index / last-log-term: set to 0 in this stage; no log entries yet
  • vote-granted: whether the vote was granted
  • response term: the responder’s current term, so a stale candidate can update itself

AppendEntries RPC that is used for heartbeats (empty entries) to maintain leader authority.

POST /raft/append-entries
{
"term": 3,
"leader-id": "10.0.42.101:8080",
"prev-log-index": 0,
"prev-log-term": 0,
"entries": [],
"leader-commit": 0
}
----
200
{
"term": 3,
"success": true
}
  • leader-id: the leader’s own address (from ADDR), so followers can redirect clients
  • entries: empty in this stage; will carry log entries in later stages
  • prev-log-index / prev-log-term: set to 0 in this stage; no log entries yet
  • success: true if the follower’s log matched prev-log-index and prev-log-term
  • response term: the responder’s current term, so a stale leader can step down

Returns the node’s current cluster state.

GET /cluster/info
----
200
{
"id": "10.0.42.101:8080",
"role": "leader",
"term": 3,
"leader": "10.0.42.101:8080",
"peers": ["10.0.42.102:8080", "10.0.42.103:8080", "10.0.42.104:8080", "10.0.42.105:8080"]
}
  • id: this node’s own address (from ADDR)
  • role: this node’s current role, one of leader, candidate, or follower
  • term: current term; starts at 0 before any election
  • leader: the known leader’s address, or null before an election and after leader failure
  • peers: all cluster members except this node, sorted lexicographically

Your server will be started as a 5-node cluster, with its own address in ADDR and the other four nodes’ addresses in PEERS. The tests will verify leader election behavior:

Terminal window
$ clstr test leader-election
Testing leader-election: Cluster Elects and Maintains Leader
✓ /cluster/info Returns Pre-Election State
✓ Leader Election Completes
✓ Exactly One Leader Per Term
✓ Leader Maintains Authority via Heartbeats
✓ Followers Redirect to Leader
✓ Leader Handles KV Operations
✓ New Leader Elected After Leader Crash
✓ Service Unavailable During Election
✓ Partition Enforces Quorum
✓ Cluster Converges After Partition Heals
PASSED ✓
Run 'clstr next' to advance to the next stage.

Example failure:

Terminal window
$ clstr test leader-election
Testing leader-election: Cluster Elects and Maintains Leader
✓ /cluster/info Returns Pre-Election State
✓ Leader Election Completes
✓ Exactly One Leader Per Term
✓ Leader Maintains Authority via Heartbeats
✓ Followers Redirect to Leader
✓ Leader Handles KV Operations
✓ New Leader Elected After Leader Crash
✓ Service Unavailable During Election
✗ Partition Enforces Quorum
GET - 1 of 2 nodes passed (expected all 2)
http://127.0.0.1:8003/cluster/info (n1) → 200
Expected field "role": one of [follower, candidate]
{
"id": "10.0.42.101:8080",
"leader": "10.0.42.101:8080",
"peers": [
"10.0.42.102:8080",
"10.0.42.103:8080",
"10.0.42.104:8080",
"10.0.42.105:8080"
],
"role": "leader",
"term": 5
}
The minority partition [n1, n2] must not elect a leader.
A candidate needs votes from at least 3 nodes; with only n1 and n2 reachable, no election can succeed.
FAILED ✗
Read the guide: https://clstr.io/kv-store/leader-election

Partition tests use iptables DROP rules applied directly inside each container to cut traffic between groups of nodes. Rules are bidirectional, so a partitioned node can neither send to nor receive from nodes in the other group. When the partition heals, all rules are flushed on every node to restore full connectivity.

Your implementation doesn’t need to do anything special to support this. Partitions are transparent at the network level: RPCs to unreachable nodes will time out or be refused.

Each node’s output (stdout/stderr) is captured during testing and viewable with clstr logs. Logs persist across restarts within a run, so you see the full history even after a crash or partition:

Terminal window
$ clstr logs n1
================ STARTED ================
Node started addr=10.0.42.101:8080
Voted for 10.0.42.104:8080 in term 1
Received heartbeat from 10.0.42.104:8080, following in term 1
================ KILLED ================
================ STARTED ================
Node started addr=10.0.42.101:8080
Received heartbeat from 10.0.42.104:8080, following in term 1
================ PARTITIONED FROM: n3, n4, n5 ================
Heartbeat timeout, starting election for term 2
Received vote from 10.0.42.102:8080 (2/5)
Heartbeat timeout, starting election for term 3
Received vote from 10.0.42.102:8080 (2/5)
================ PARTITION HEALED ================
Election timeout, starting election for term 4
Received vote from 10.0.42.102:8080 (2/5)
Received vote from 10.0.42.103:8080 (3/5)
Became leader in term 4
Terminal window
$ clstr logs n4
================ STARTED ================
Node started addr=10.0.42.104:8080
Election timeout, starting election for term 1
Received vote from 10.0.42.101:8080 (2/5)
Received vote from 10.0.42.102:8080 (3/5)
Became leader in term 1
================ PARTITIONED FROM: n1, n2 ================
Heartbeat to 10.0.42.101:8080 failed
Heartbeat to 10.0.42.102:8080 failed
Maintaining leadership in term 1, majority reachable
================ PARTITION HEALED ================
Received RequestVote from 10.0.42.101:8080, stepping down in term 4
Voted for 10.0.42.101:8080 in term 4
Received heartbeat from 10.0.42.101:8080, following in term 4

Add your own logging to help trace election and heartbeat activity across nodes.