Skip to content

Testing

attest is the testing framework behind clstr challenges. It provides a fluent API for writing black-box tests against programs running in Docker containers, validating external behavior without accessing implementation internals. The source is at internal/attest.

A challenge is a Go package that registers stages, each backed by an attest test suite.

challenges/
└── kvstore/
├── init.go # Challenge registration
├── http_api.go # Stage 1
├── persistence.go # Stage 2
└── crash_recovery.go # Stage 3

Each stage is a function returning *Suite:

package kvstore
import (
. "github.com/clstr-io/clstr/internal/attest"
)
func HTTPAPI() *Suite {
return New(WithCluster(1)).
// 1
Test("PUT Stores Values", func(do *Do) {
do.PUT(Node("n1"), "/kv/key", "value").
Status(Is(200)).
Hint("Check that your HTTP handler registers PUT /kv/{key} and returns 200.").
Run()
}).
// 2
Test("GET Returns Stored Values", func(do *Do) {
do.GET(Node("n1"), "/kv/key").
Status(Is(200)).
Body(Is("value")).
Hint("GET /kv/{key} should return the value stored by the preceding PUT.").
Run()
})
}

Import attest with . for cleaner syntax. Number tests with comments to visually separate them.

Create init.go:

package kvstore
import "github.com/clstr-io/clstr/internal/registry"
func init() {
challenge := &registry.Challenge{
Name: "Distributed Key-Value Store",
Summary: `Build a distributed key-value store from scratch.`,
}
challenge.AddStage("http-api", "Store and Retrieve Data", HTTPAPI)
challenge.AddStage("persistence", "Data Survives SIGTERM", Persistence)
challenge.AddStage("crash-recovery", "Data Survives SIGKILL", CrashRecovery)
registry.RegisterChallenge("kv-store", challenge)
}

Import your challenge in challenges/challenges.go:

package challenges
import (
_ "github.com/clstr-io/clstr/challenges/kvstore"
)

Creates a new test suite. Options configure the cluster and timeouts:

// Single-node suite
return New(WithCluster(1)).Test(...)
// Five-node cluster
return New(WithCluster(5)).Test(...)

Declares an n-node cluster named n1, n2, …, nN. Nodes are started before any tests run and stopped after.

How long to wait for a node’s GET /health to return 200 OK after starting (default 10s):

New(WithCluster(1), WithNodeStartTimeout(30*time.Second))

How long to wait for a node to exit after SIGTERM before sending SIGKILL (default 5s):

New(WithCluster(1), WithNodeShutdownTimeout(15*time.Second))

Default timeout for Eventually and Consistently (default 5s):

New(WithCluster(1), WithRetryTimeout(10*time.Second))

HTTP client timeout per request (default 5s):

New(WithCluster(1), WithRequestTimeout(2*time.Second))

Tests run sequentially. State persists between tests so data written in test 1 is available in test 2. First failure stops execution.

Make HTTP requests using the method-specific helpers:

do.GET(sel, path, args...)
do.PUT(sel, path, args...)
do.POST(sel, path, args...)
do.DELETE(sel, path, args...)
do.PATCH(sel, path, args...)

sel is a NodeSelector that targets one or more nodes. Optional args are a body string and an H header map:

// No body
do.GET(Node("n1"), "/kv/key")
// With body
do.PUT(Node("n1"), "/kv/key", "value")
// With body and headers
do.POST(Node("n1"), "/api", `{"key":"value"}`, H{"Content-Type": "application/json"})

Each call returns a *Check. Chain modifiers onto it and call .Run() to execute:

do.PUT(Node("n1"), "/kv/key", "value").
Status(Is(200)).
Hint("Your server should accept PUT requests.").
Run()

A NodeSelector controls which nodes receive a request and how the check is evaluated.

Targets a single named node. The check passes if that node satisfies all conditions:

do.GET(Node("n1"), "/cluster/info").
JSON("role", Is("leader")).
Run()

Passes when every node satisfies the check. With no arguments, targets all nodes in the cluster. Pass names to target a subset:

// All nodes in the cluster
do.GET(do.AllNodes(), "/cluster/info").
Eventually(2*time.Second).
JSON("term", GreaterThan("3")).
Run()
// Subset
do.GET(do.AllNodes("n1", "n2"), "/cluster/info").
Consistently(2*time.Second).
JSON("leader", IsNull[string]()).
Run()

Killed or stopped nodes are automatically excluded.

Passes when exactly one node satisfies the check:

do.GET(do.ExactlyOneNode(), "/cluster/info").
Consistently(2*time.Second).
JSON("role", Is("leader")).
Run()

Passes when at least one node satisfies the check:

do.GET(do.AtLeastOneNode(), "/cluster/info").
Eventually(2*time.Second).
JSON("role", Is("leader")).
Run()

Passes when every node except the named ones satisfies the check. Useful for asserting on all followers when you know the leader:

leaderNode, _ := findLeader(do)
do.GET(do.ExceptNodes(leaderNode), "/cluster/info").
JSON("role", Is("follower")).
Run()

Resolves a NodeSelector to a []string of node names. Useful when you need to feed a selector’s result into another selector:

leaderNode, _ := findLeader(do)
// Check that exactly one of the non-leader nodes becomes the new leader
do.GET(do.ExactlyOneNode(do.Names(do.ExceptNodes(leaderNode))...), "/cluster/info").
Eventually(3 * time.Second).
JSON("role", Is("leader")).
Run()

Returns the names of all nodes in the cluster. Useful when you need to iterate nodes manually:

for _, name := range do.Nodes() {
r := do.Fetch(name, "/cluster/info")
if r != nil && r.JSON("role") == "leader" {
// ...
}
}

Chain modifiers onto a request to define what a passing response looks like.

Validates the HTTP response status code:

do.GET(Node("n1"), "/kv/missing").
Status(Is(404)).
Hint("Return 404 for missing keys.").
Run()

Validates the HTTP response body:

do.GET(Node("n1"), "/kv/missing").
Status(Is(404)).
Body(Matches("^key not found\n?$")).
Hint("Return 'key not found' for missing keys.").
Run()

Validates a JSON field at the given gjson path:

do.GET(Node("n1"), "/cluster/info").
Status(Is(200)).
JSON("role", Is("leader")).
JSON("term", Is("1")).
Hint("Should return cluster info.").
Run()
// Nested fields
do.GET(Node("n1"), "/log").
JSON("entries.0.term", Is("1")).
JSON("entries.1.index", Is("2")).
Run()
// Null check
do.GET(Node("n1"), "/cluster/info").
JSON("leader", IsNull[string]()).
Hint("Leader should be null when no leader elected.").
Run()

Validates a response header:

do.PUT(Node("n2"), "/kv/foo", "value").
Status(Is(307)).
Header("Location", Matches(`^http://10\.0\.42\.\d+:\d+/kv/foo$`)).
Hint("Followers should redirect writes to the leader.").
Run()

Sets the help text shown when the check fails. Write it as a concrete next step:

do.PUT(Node("n1"), "/kv/", "value").
Status(Is(400)).
Body(Matches("^key cannot be empty\n?$")).
Hint("Your server should reject empty keys.\n" +
"Add validation to return 400 Bad Request for empty keys.").
Run()

Executes the check. Panics on failure, which is caught by the test runner and reported as a test failure.

By default checks execute once immediately. Use .Eventually() or .Consistently() to change this.

Retries until the check passes or the timeout expires (default 5s):

// Wait for replica to sync
do.GET(Node("n2"), "/kv/key").
Eventually().
Status(Is(200)).
Body(Is("value")).
Hint("Replica should eventually receive replicated data.").
Run()
// Custom timeout
do.GET(Node("n2"), "/kv/key").
Eventually(10 * time.Second).
Status(Is(200)).
Body(Is("value")).
Hint("Replica should sync within 10 seconds.").
Run()

Verifies the check stays true for the entire duration (default 5s):

// Verify no leader is elected in minority partition
do.GET(do.AllNodes("n1", "n2"), "/cluster/info").
Consistently(5 * time.Second).
JSON("leader", IsNull[string]()).
Hint("Minority partition should not elect a leader.").
Run()

Matchers define what values satisfy a check.

Exact equality:

Status(Is(200))
Body(Matches("^key not found\n?$"))

Ordered comparisons. Work on any comparable type, most commonly strings from JSON fields:

JSON("term", GreaterThan("1"))
JSON("index", LessThan("100"))

Checks that a JSON field is null. Requires a type parameter:

JSON("leader", IsNull[string]())

Checks that a string contains a substring:

Body(Contains("error"))

Regex matching:

Body(Matches(`^[0-9]+$`))

Validates that a value has a specific length. Works on strings, arrays, slices, maps, and channels:

JSON("entries", HasLen[string](3))
JSON("peers", HasLen[string](4))

Accepts any of the provided values. Useful when order is non-deterministic:

Body(OneOf("value1", "value2", "value3"))

Negates another matcher:

Status(Not(Is(500)))
Body(Not(Contains("panic")))

Pass multiple matchers to any modifier; all must pass:

Status(Is(200), Not(Is(404)))
Body(Contains("Hello"), Not(Contains("error")))

Control the lifecycle of nodes and the network during a test.

Starts a previously stopped or killed node:

do.Start("n1")

Nodes are started automatically at the beginning of a suite via WithCluster. Use Start only when you need to restart a node that was stopped or killed during a test.

Sends SIGTERM and waits for graceful exit:

do.Stop("n1")

Sends SIGKILL immediately:

do.Kill("n1")

Stops then starts a node. The optional signal controls how the node is stopped:

// Graceful restart (SIGTERM)
do.Restart("n1")
// Crash simulation (SIGKILL)
do.Restart("n1", syscall.SIGKILL)

Isolate groups of nodes from each other:

// Split a 5-node cluster into minority and majority partitions
do.Partition([]string{"n1", "n2"}, []string{"n3", "n4", "n5"})

Restores full connectivity on every node:

do.Heal()

Applies one or more network impairments to outgoing traffic on the selected nodes. Impairments are combined into a single tc netem command. Call do.Repair() to remove them.

// Add 100ms delay to all nodes
do.Impair(do.AllNodes(), Delay(100*time.Millisecond))
// Add delay with jitter to a single node
do.Impair(Node("n1"), Delay(100*time.Millisecond, 20*time.Millisecond))
// Combine impairments
do.Impair(do.AllNodes(), Delay(50*time.Millisecond), Loss(10))
// All nodes except the leader
do.Impair(do.ExceptNodes(leaderNode), Loss(20))

Available impairments:

  • Delay(d, jitter...): adds fixed latency. Optional jitter uses a normal distribution with 25% correlation: Delay(100*time.Millisecond, 20*time.Millisecond).
  • Loss(pct): randomly drops pct% of packets (25% correlation): Loss(10).
  • Duplicate(pct): sends pct% of packets twice: Duplicate(1).
  • Reorder(pct): delivers pct% of packets out of order (25% correlation). Must be combined with Delay: Reorder(25).

Removes tc netem impairments from the selected nodes. With no argument, repairs all nodes:

// Repair all nodes
do.Repair()
// Repair a specific node
do.Repair(Node("n1"))
// Repair all except the leader
do.Repair(do.ExceptNodes(leaderNode))

Makes a one-off GET request to a named node and returns a *FetchResponse. Returns nil if the node is unreachable. Useful for reading state before a check:

info := do.Fetch("n1", "/cluster/info")
if info == nil {
panic("n1 is down")
}
term := info.JSON("term")
do.Kill("n1")
do.GET(do.AllNodes(), "/cluster/info").
Eventually(2*time.Second).
JSON("term", GreaterThan(term)).
Run()

Run operations in parallel to test concurrent behavior.

Runs fn n times in parallel, passing each invocation a 1-based index:

do.Concurrently(100, func(i int) {
do.PUT(Node("n1"), fmt.Sprintf("/kv/key%d", i), fmt.Sprintf("value%d", i)).
Status(Is(200)).
Hint("Your server should handle concurrent PUT requests.").
Run()
})

Waits for all goroutines to complete. If any panic, the first panic is re-raised.