Testing
attest is the testing framework behind clstr challenges. It provides a fluent API for writing black-box tests against programs running in Docker containers, validating external behavior without accessing implementation internals. The source is at internal/attest.
Creating a Challenge
Section titled “Creating a Challenge”A challenge is a Go package that registers stages, each backed by an attest test suite.
Directory Structure
Section titled “Directory Structure”challenges/└── kvstore/ ├── init.go # Challenge registration ├── http_api.go # Stage 1 ├── persistence.go # Stage 2 └── crash_recovery.go # Stage 3Stage Structure
Section titled “Stage Structure”Each stage is a function returning *Suite:
package kvstore
import ( . "github.com/clstr-io/clstr/internal/attest")
func HTTPAPI() *Suite { return New(WithCluster(1)).
// 1 Test("PUT Stores Values", func(do *Do) { do.PUT(Node("n1"), "/kv/key", "value"). Status(Is(200)). Hint("Check that your HTTP handler registers PUT /kv/{key} and returns 200."). Run() }).
// 2 Test("GET Returns Stored Values", func(do *Do) { do.GET(Node("n1"), "/kv/key"). Status(Is(200)). Body(Is("value")). Hint("GET /kv/{key} should return the value stored by the preceding PUT."). Run() })}Import attest with . for cleaner syntax. Number tests with comments to visually separate them.
Challenge Registration
Section titled “Challenge Registration”Create init.go:
package kvstore
import "github.com/clstr-io/clstr/internal/registry"
func init() { challenge := ®istry.Challenge{ Name: "Distributed Key-Value Store", Summary: `Build a distributed key-value store from scratch.`, }
challenge.AddStage("http-api", "Store and Retrieve Data", HTTPAPI) challenge.AddStage("persistence", "Data Survives SIGTERM", Persistence) challenge.AddStage("crash-recovery", "Data Survives SIGKILL", CrashRecovery)
registry.RegisterChallenge("kv-store", challenge)}Auto-Discovery
Section titled “Auto-Discovery”Import your challenge in challenges/challenges.go:
package challenges
import ( _ "github.com/clstr-io/clstr/challenges/kvstore")Suite Configuration
Section titled “Suite Configuration”New(opts...)
Section titled “New(opts...)”Creates a new test suite. Options configure the cluster and timeouts:
// Single-node suitereturn New(WithCluster(1)).Test(...)
// Five-node clusterreturn New(WithCluster(5)).Test(...)WithCluster(n)
Section titled “WithCluster(n)”Declares an n-node cluster named n1, n2, …, nN. Nodes are started before any tests run and stopped after.
WithNodeStartTimeout(d)
Section titled “WithNodeStartTimeout(d)”How long to wait for a node’s GET /health to return 200 OK after starting (default 10s):
New(WithCluster(1), WithNodeStartTimeout(30*time.Second))WithNodeShutdownTimeout(d)
Section titled “WithNodeShutdownTimeout(d)”How long to wait for a node to exit after SIGTERM before sending SIGKILL (default 5s):
New(WithCluster(1), WithNodeShutdownTimeout(15*time.Second))WithRetryTimeout(d)
Section titled “WithRetryTimeout(d)”Default timeout for Eventually and Consistently (default 5s):
New(WithCluster(1), WithRetryTimeout(10*time.Second))WithRequestTimeout(d)
Section titled “WithRequestTimeout(d)”HTTP client timeout per request (default 5s):
New(WithCluster(1), WithRequestTimeout(2*time.Second))HTTP Requests
Section titled “HTTP Requests”Tests run sequentially. State persists between tests so data written in test 1 is available in test 2. First failure stops execution.
Make HTTP requests using the method-specific helpers:
do.GET(sel, path, args...)do.PUT(sel, path, args...)do.POST(sel, path, args...)do.DELETE(sel, path, args...)do.PATCH(sel, path, args...)sel is a NodeSelector that targets one or more nodes. Optional args are a body string and an H header map:
// No bodydo.GET(Node("n1"), "/kv/key")
// With bodydo.PUT(Node("n1"), "/kv/key", "value")
// With body and headersdo.POST(Node("n1"), "/api", `{"key":"value"}`, H{"Content-Type": "application/json"})Each call returns a *Check. Chain modifiers onto it and call .Run() to execute:
do.PUT(Node("n1"), "/kv/key", "value"). Status(Is(200)). Hint("Your server should accept PUT requests."). Run()Node Selectors
Section titled “Node Selectors”A NodeSelector controls which nodes receive a request and how the check is evaluated.
Node(name)
Section titled “Node(name)”Targets a single named node. The check passes if that node satisfies all conditions:
do.GET(Node("n1"), "/cluster/info"). JSON("role", Is("leader")). Run()do.AllNodes(names...)
Section titled “do.AllNodes(names...)”Passes when every node satisfies the check. With no arguments, targets all nodes in the cluster. Pass names to target a subset:
// All nodes in the clusterdo.GET(do.AllNodes(), "/cluster/info"). Eventually(2*time.Second). JSON("term", GreaterThan("3")). Run()
// Subsetdo.GET(do.AllNodes("n1", "n2"), "/cluster/info"). Consistently(2*time.Second). JSON("leader", IsNull[string]()). Run()Killed or stopped nodes are automatically excluded.
do.ExactlyOneNode(names...)
Section titled “do.ExactlyOneNode(names...)”Passes when exactly one node satisfies the check:
do.GET(do.ExactlyOneNode(), "/cluster/info"). Consistently(2*time.Second). JSON("role", Is("leader")). Run()do.AtLeastOneNode(names...)
Section titled “do.AtLeastOneNode(names...)”Passes when at least one node satisfies the check:
do.GET(do.AtLeastOneNode(), "/cluster/info"). Eventually(2*time.Second). JSON("role", Is("leader")). Run()do.ExceptNodes(names...)
Section titled “do.ExceptNodes(names...)”Passes when every node except the named ones satisfies the check. Useful for asserting on all followers when you know the leader:
leaderNode, _ := findLeader(do)
do.GET(do.ExceptNodes(leaderNode), "/cluster/info"). JSON("role", Is("follower")). Run()do.Names(sel)
Section titled “do.Names(sel)”Resolves a NodeSelector to a []string of node names. Useful when you need to feed a selector’s result into another selector:
leaderNode, _ := findLeader(do)
// Check that exactly one of the non-leader nodes becomes the new leaderdo.GET(do.ExactlyOneNode(do.Names(do.ExceptNodes(leaderNode))...), "/cluster/info"). Eventually(3 * time.Second). JSON("role", Is("leader")). Run()do.Nodes()
Section titled “do.Nodes()”Returns the names of all nodes in the cluster. Useful when you need to iterate nodes manually:
for _, name := range do.Nodes() { r := do.Fetch(name, "/cluster/info") if r != nil && r.JSON("role") == "leader" { // ... }}Check Modifiers
Section titled “Check Modifiers”Chain modifiers onto a request to define what a passing response looks like.
.Status(matchers...)
Section titled “.Status(matchers...)”Validates the HTTP response status code:
do.GET(Node("n1"), "/kv/missing"). Status(Is(404)). Hint("Return 404 for missing keys."). Run().Body(matchers...)
Section titled “.Body(matchers...)”Validates the HTTP response body:
do.GET(Node("n1"), "/kv/missing"). Status(Is(404)). Body(Matches("^key not found\n?$")). Hint("Return 'key not found' for missing keys."). Run().JSON(path, matchers...)
Section titled “.JSON(path, matchers...)”Validates a JSON field at the given gjson path:
do.GET(Node("n1"), "/cluster/info"). Status(Is(200)). JSON("role", Is("leader")). JSON("term", Is("1")). Hint("Should return cluster info."). Run()
// Nested fieldsdo.GET(Node("n1"), "/log"). JSON("entries.0.term", Is("1")). JSON("entries.1.index", Is("2")). Run()
// Null checkdo.GET(Node("n1"), "/cluster/info"). JSON("leader", IsNull[string]()). Hint("Leader should be null when no leader elected."). Run().Header(name, matchers...)
Section titled “.Header(name, matchers...)”Validates a response header:
do.PUT(Node("n2"), "/kv/foo", "value"). Status(Is(307)). Header("Location", Matches(`^http://10\.0\.42\.\d+:\d+/kv/foo$`)). Hint("Followers should redirect writes to the leader."). Run().Hint(text)
Section titled “.Hint(text)”Sets the help text shown when the check fails. Write it as a concrete next step:
do.PUT(Node("n1"), "/kv/", "value"). Status(Is(400)). Body(Matches("^key cannot be empty\n?$")). Hint("Your server should reject empty keys.\n" + "Add validation to return 400 Bad Request for empty keys."). Run().Run()
Section titled “.Run()”Executes the check. Panics on failure, which is caught by the test runner and reported as a test failure.
Timing
Section titled “Timing”By default checks execute once immediately. Use .Eventually() or .Consistently() to change this.
.Eventually(timeout ...time.Duration)
Section titled “.Eventually(timeout ...time.Duration)”Retries until the check passes or the timeout expires (default 5s):
// Wait for replica to syncdo.GET(Node("n2"), "/kv/key"). Eventually(). Status(Is(200)). Body(Is("value")). Hint("Replica should eventually receive replicated data."). Run()
// Custom timeoutdo.GET(Node("n2"), "/kv/key"). Eventually(10 * time.Second). Status(Is(200)). Body(Is("value")). Hint("Replica should sync within 10 seconds."). Run().Consistently(timeout ...time.Duration)
Section titled “.Consistently(timeout ...time.Duration)”Verifies the check stays true for the entire duration (default 5s):
// Verify no leader is elected in minority partitiondo.GET(do.AllNodes("n1", "n2"), "/cluster/info"). Consistently(5 * time.Second). JSON("leader", IsNull[string]()). Hint("Minority partition should not elect a leader."). Run()Matchers
Section titled “Matchers”Matchers define what values satisfy a check.
Is(value)
Section titled “Is(value)”Exact equality:
Status(Is(200))Body(Matches("^key not found\n?$"))GreaterThan(value) / LessThan(value)
Section titled “GreaterThan(value) / LessThan(value)”Ordered comparisons. Work on any comparable type, most commonly strings from JSON fields:
JSON("term", GreaterThan("1"))JSON("index", LessThan("100"))IsNull[T]()
Section titled “IsNull[T]()”Checks that a JSON field is null. Requires a type parameter:
JSON("leader", IsNull[string]())Contains(substring)
Section titled “Contains(substring)”Checks that a string contains a substring:
Body(Contains("error"))Matches(pattern)
Section titled “Matches(pattern)”Regex matching:
Body(Matches(`^[0-9]+$`))HasLen[T](length)
Section titled “HasLen[T](length)”Validates that a value has a specific length. Works on strings, arrays, slices, maps, and channels:
JSON("entries", HasLen[string](3))JSON("peers", HasLen[string](4))OneOf(values...)
Section titled “OneOf(values...)”Accepts any of the provided values. Useful when order is non-deterministic:
Body(OneOf("value1", "value2", "value3"))Not(matcher)
Section titled “Not(matcher)”Negates another matcher:
Status(Not(Is(500)))Body(Not(Contains("panic")))Multiple Matchers
Section titled “Multiple Matchers”Pass multiple matchers to any modifier; all must pass:
Status(Is(200), Not(Is(404)))Body(Contains("Hello"), Not(Contains("error")))Node Management
Section titled “Node Management”Control the lifecycle of nodes and the network during a test.
do.Start(name)
Section titled “do.Start(name)”Starts a previously stopped or killed node:
do.Start("n1")Nodes are started automatically at the beginning of a suite via WithCluster. Use Start only when you need to restart a node that was stopped or killed during a test.
do.Stop(name)
Section titled “do.Stop(name)”Sends SIGTERM and waits for graceful exit:
do.Stop("n1")do.Kill(name)
Section titled “do.Kill(name)”Sends SIGKILL immediately:
do.Kill("n1")do.Restart(name, sig...)
Section titled “do.Restart(name, sig...)”Stops then starts a node. The optional signal controls how the node is stopped:
// Graceful restart (SIGTERM)do.Restart("n1")
// Crash simulation (SIGKILL)do.Restart("n1", syscall.SIGKILL)do.Partition(groups...)
Section titled “do.Partition(groups...)”Isolate groups of nodes from each other:
// Split a 5-node cluster into minority and majority partitionsdo.Partition([]string{"n1", "n2"}, []string{"n3", "n4", "n5"})do.Heal()
Section titled “do.Heal()”Restores full connectivity on every node:
do.Heal()do.Impair(sel, impairments...)
Section titled “do.Impair(sel, impairments...)”Applies one or more network impairments to outgoing traffic on the selected nodes. Impairments are combined into a single tc netem command. Call do.Repair() to remove them.
// Add 100ms delay to all nodesdo.Impair(do.AllNodes(), Delay(100*time.Millisecond))
// Add delay with jitter to a single nodedo.Impair(Node("n1"), Delay(100*time.Millisecond, 20*time.Millisecond))
// Combine impairmentsdo.Impair(do.AllNodes(), Delay(50*time.Millisecond), Loss(10))
// All nodes except the leaderdo.Impair(do.ExceptNodes(leaderNode), Loss(20))Available impairments:
Delay(d, jitter...): adds fixed latency. Optional jitter uses a normal distribution with 25% correlation:Delay(100*time.Millisecond, 20*time.Millisecond).Loss(pct): randomly dropspct% of packets (25% correlation):Loss(10).Duplicate(pct): sendspct% of packets twice:Duplicate(1).Reorder(pct): deliverspct% of packets out of order (25% correlation). Must be combined withDelay:Reorder(25).
do.Repair(sel...)
Section titled “do.Repair(sel...)”Removes tc netem impairments from the selected nodes. With no argument, repairs all nodes:
// Repair all nodesdo.Repair()
// Repair a specific nodedo.Repair(Node("n1"))
// Repair all except the leaderdo.Repair(do.ExceptNodes(leaderNode))do.Fetch(name, path)
Section titled “do.Fetch(name, path)”Makes a one-off GET request to a named node and returns a *FetchResponse. Returns nil if the node is unreachable. Useful for reading state before a check:
info := do.Fetch("n1", "/cluster/info")if info == nil { panic("n1 is down")}term := info.JSON("term")
do.Kill("n1")
do.GET(do.AllNodes(), "/cluster/info"). Eventually(2*time.Second). JSON("term", GreaterThan(term)). Run()Concurrency
Section titled “Concurrency”Run operations in parallel to test concurrent behavior.
do.Concurrently(n, fn)
Section titled “do.Concurrently(n, fn)”Runs fn n times in parallel, passing each invocation a 1-based index:
do.Concurrently(100, func(i int) { do.PUT(Node("n1"), fmt.Sprintf("/kv/key%d", i), fmt.Sprintf("value%d", i)). Status(Is(200)). Hint("Your server should handle concurrent PUT requests."). Run()})Waits for all goroutines to complete. If any panic, the first panic is re-raised.