Leader Election

Implement Raft leader election to form a static 5-node cluster that elects a single leader.

Cluster Formation

The test harness starts each node as a Docker container on a shared network and sets two environment variables:

ADDR=10.0.42.101:8080
PEERS=10.0.42.102:8080,10.0.42.103:8080,10.0.42.104:8080,10.0.42.105:8080

ADDR is this node’s own address. PEERS is a comma-separated list of the other nodes’ addresses. Each node listens on port 8080. The cluster is static; membership doesn’t change in this stage.

Leader Election

All nodes start as followers. If a follower doesn’t receive heartbeats within the election timeout (randomized between 500-1,000ms), it becomes a candidate and starts an election.

These values are more generous than the Raft paper’s suggested 150-300ms election timeout. Each node runs in a Docker container on a shared host, which adds container network latency and scheduling jitter on top of any GC or runtime pauses in your implementation. Higher values give varying implementations room to handle these issues without spurious elections.

Candidates request votes from other nodes. A candidate becomes leader if it receives votes from a majority ( $\lceil\frac{n+1}{2}\rceil$ where $n$ is the cluster size; 3 votes for a 5-node cluster). Each node grants at most one vote per term.

Terms act as a logical clock. When a node discovers a higher term, it immediately updates its term and reverts to follower.

Heartbeats

Leaders send AppendEntries RPC heartbeats every 100ms to maintain authority. The entries array is empty in this stage.

If a follower doesn’t receive heartbeats within the election timeout, it starts a new election.

Client Requests

Leaders accept GET, PUT, and DELETE requests from the HTTP API stage.

Followers redirect all requests to the leader with 307 Temporary Redirect and a Location header:

HTTP/1.1 307 Temporary Redirect
Location: http://10.0.42.101:8080/kv/mykey

If no leader is known, return 503 Service Unavailable.

Persistence

Persist currentTerm and votedFor to disk before responding to any RPC that changes them. Use fsync to ensure durability.

State must survive crashes and restarts. A node that crashes and restarts should resume with its persisted state.

API

POST /raft/request-vote

RequestVote RPC that is invoked by candidates to gather votes during leader election.

POST /raft/request-vote

{
  "term": 3,
  "candidate-id": "10.0.42.102:8080",
  "last-log-index": 0,
  "last-log-term": 0
}

----

200
{
  "term": 3,
  "vote-granted": true
}

term: candidate’s current term
candidate-id: the candidate’s own address (from ADDR)
last-log-index / last-log-term: set to 0 in this stage; no log entries yet
vote-granted: whether the vote was granted
response term: the responder’s current term, so a stale candidate can update itself

POST /raft/append-entries

AppendEntries RPC that is used for heartbeats (empty entries) to maintain leader authority.

POST /raft/append-entries

{
  "term": 3,
  "leader-id": "10.0.42.101:8080",
  "prev-log-index": 0,
  "prev-log-term": 0,
  "entries": [],
  "leader-commit": 0
}

----

200
{
  "term": 3,
  "success": true
}

leader-id: the leader’s own address (from ADDR), so followers can redirect clients
entries: empty in this stage; will carry log entries in later stages
prev-log-index / prev-log-term: set to 0 in this stage; no log entries yet
success: true if the follower’s log matched prev-log-index and prev-log-term
response term: the responder’s current term, so a stale leader can step down

GET /cluster/info

Returns the node’s current cluster state.

GET /cluster/info

----

200
{
  "id": "10.0.42.101:8080",
  "role": "leader",
  "term": 3,
  "leader": "10.0.42.101:8080",
  "peers": ["10.0.42.102:8080", "10.0.42.103:8080", "10.0.42.104:8080", "10.0.42.105:8080"]
}

id: this node’s own address (from ADDR)
role: this node’s current role, one of leader, candidate, or follower
term: current term; starts at 0 before any election
leader: the known leader’s address, or null before an election and after leader failure
peers: all cluster members except this node, sorted lexicographically

Testing

Your server will be started as a 5-node cluster, with its own address in ADDR and the other four nodes’ addresses in PEERS. The tests will verify leader election behavior:

$ clstr test leader-election
Testing leader-election: Cluster Elects and Maintains Leader

✓ /cluster/info Returns Pre-Election State
✓ Leader Election Completes
✓ Exactly One Leader Per Term
✓ Leader Maintains Authority via Heartbeats
✓ Followers Redirect to Leader
✓ Leader Handles KV Operations
✓ New Leader Elected After Leader Crash
✓ Service Unavailable During Election
✓ Partition Enforces Quorum
✓ Cluster Converges After Partition Heals

PASSED ✓

Run 'clstr next' to advance to the next stage.

Example failure:

$ clstr test leader-election
Testing leader-election: Cluster Elects and Maintains Leader

✓ /cluster/info Returns Pre-Election State
✓ Leader Election Completes
✓ Exactly One Leader Per Term
✓ Leader Maintains Authority via Heartbeats
✓ Followers Redirect to Leader
✓ Leader Handles KV Operations
✓ New Leader Elected After Leader Crash
✓ Service Unavailable During Election
✗ Partition Enforces Quorum

GET - 1 of 2 nodes passed (expected all 2)
  http://127.0.0.1:8003/cluster/info (n1) → 200
    Expected field "role": one of [follower, candidate]
      {
        "id": "10.0.42.101:8080",
        "leader": "10.0.42.101:8080",
        "peers": [
          "10.0.42.102:8080",
          "10.0.42.103:8080",
          "10.0.42.104:8080",
          "10.0.42.105:8080"
        ],
        "role": "leader",
        "term": 5
      }

  The minority partition [n1, n2] must not elect a leader.
  A candidate needs votes from at least 3 nodes; with only n1 and n2 reachable, no election can succeed.

FAILED ✗

Read the guide: https://clstr.io/kv-store/leader-election

Network Partitions

Partition tests use iptables DROP rules applied directly inside each container to cut traffic between groups of nodes. Rules are bidirectional, so a partitioned node can neither send to nor receive from nodes in the other group. When the partition heals, all rules are flushed on every node to restore full connectivity.

Your implementation doesn’t need to do anything special to support this. Partitions are transparent at the network level: RPCs to unreachable nodes will time out or be refused.

Debugging

Each node’s output (stdout/stderr) is captured during testing and viewable with clstr logs. Logs persist across restarts within a run, so you see the full history even after a crash or partition:

$ clstr logs n1

================ STARTED ================

Node started addr=10.0.42.101:8080
Voted for 10.0.42.104:8080 in term 1
Received heartbeat from 10.0.42.104:8080, following in term 1

================ KILLED ================


================ STARTED ================

Node started addr=10.0.42.101:8080
Received heartbeat from 10.0.42.104:8080, following in term 1

================ PARTITIONED FROM: n3, n4, n5 ================

Heartbeat timeout, starting election for term 2
Received vote from 10.0.42.102:8080 (2/5)
Heartbeat timeout, starting election for term 3
Received vote from 10.0.42.102:8080 (2/5)

================ PARTITION HEALED ================

Election timeout, starting election for term 4
Received vote from 10.0.42.102:8080 (2/5)
Received vote from 10.0.42.103:8080 (3/5)
Became leader in term 4

$ clstr logs n4

================ STARTED ================

Node started addr=10.0.42.104:8080
Election timeout, starting election for term 1
Received vote from 10.0.42.101:8080 (2/5)
Received vote from 10.0.42.102:8080 (3/5)
Became leader in term 1

================ PARTITIONED FROM: n1, n2 ================

Heartbeat to 10.0.42.101:8080 failed
Heartbeat to 10.0.42.102:8080 failed
Maintaining leadership in term 1, majority reachable

================ PARTITION HEALED ================

Received RequestVote from 10.0.42.101:8080, stepping down in term 4
Voted for 10.0.42.101:8080 in term 4
Received heartbeat from 10.0.42.101:8080, following in term 4

Add your own logging to help trace election and heartbeat activity across nodes.

Resources

Raft Visualization by The Secret Lives of Data
The Raft Consensus Algorithm
Distributed Systems 6.2: Raft by Martin Kleppmann
Students’ Guide to Raft by Jon Gjengset
Database Internals Chapter 10: Leader Election by Alex Petrov
Database Internals Chapter 14: Consensus by Alex Petrov