Skip to content

Testing

The testing framework (attest) provides a fluent API for writing black-box tests against programs running in Docker containers. Tests validate external behavior without accessing implementation internals.

A typical test suite:

package kvstore
import (
. "github.com/clstr-io/clstr/internal/attest"
)
func HTTPAPI() *Suite {
return New(
WithCluster(1),
).
// 1
Test("PUT stores data", func(do *Do) {
do.PUT(Node("n1"), "/kv/key", "value").
Status(Is(200)).
Hint("Your server should accept PUT requests.\n" +
"Ensure your HTTP handler processes PUT to /kv/{key}.").
Run()
}).
// 2
Test("GET retrieves data", func(do *Do) {
do.GET(Node("n1"), "/kv/key").
Status(Is(200)).
Body(Is("value")).
Hint("Your server should return stored values.").
Run()
})
}

Tests run sequentially. State persists between tests so data written in test 1 is available in test 2. First failure stops execution.

Make HTTP requests using the method-specific helpers:

do.GET(sel, path, args...)
do.PUT(sel, path, args...)
do.POST(sel, path, args...)
do.DELETE(sel, path, args...)
do.PATCH(sel, path, args...)

sel is a NodeSelector that targets one or more nodes. Optional args are a body string and an H header map:

// No body
do.GET(Node("n1"), "/kv/key")
// With body
do.PUT(Node("n1"), "/kv/key", "value")
// With body and headers
do.POST(Node("n1"), "/api", `{"key":"value"}`, H{"Content-Type": "application/json"})

Each call returns a *Check. Chain modifiers onto it and call .Run() to execute:

do.PUT(Node("n1"), "/kv/key", "value").
Status(Is(200)).
Hint("Your server should accept PUT requests.").
Run()

A NodeSelector controls which nodes receive a request and how the check is evaluated.

Targets a single named node. The check passes if that node satisfies all conditions:

do.GET(Node("n1"), "/cluster/info").
JSON("role", Is("leader")).
Run()

Passes when every node satisfies the check. With no arguments, targets all nodes in the cluster. Pass names to target a subset:

// All nodes in the cluster
do.GET(do.AllNodes(), "/cluster/info").
Eventually(2*time.Second).
JSON("term", GreaterThan("3")).
Run()
// Subset
do.GET(do.AllNodes("n1", "n2"), "/cluster/info").
Consistently(2*time.Second).
JSON("leader", IsNull[string]()).
Run()

Killed or stopped nodes are automatically excluded.

Passes when exactly one node satisfies the check:

do.GET(do.ExactlyOneNode(), "/cluster/info").
Consistently(2*time.Second).
JSON("role", Is("leader")).
Run()

Passes when at least one node satisfies the check:

do.GET(do.AtLeastOneNode(), "/cluster/info").
Eventually(2*time.Second).
JSON("role", Is("leader")).
Run()

Returns the names of all nodes in the cluster. Useful when you need to iterate nodes manually:

for _, name := range do.Nodes() {
r := do.Fetch(name, "/cluster/info")
if r != nil && r.JSON("role") == "leader" {
// ...
}
}

Validates the HTTP response status code:

do.GET(Node("n1"), "/kv/missing").
Status(Is(404)).
Hint("Return 404 for missing keys.").
Run()

Validates the HTTP response body:

do.GET(Node("n1"), "/kv/missing").
Status(Is(404)).
Body(Matches("^key not found\n?$")).
Hint("Return 'key not found' for missing keys.").
Run()

Validates a JSON field at the given gjson path:

do.GET(Node("n1"), "/cluster/info").
Status(Is(200)).
JSON("role", Is("leader")).
JSON("term", Is("1")).
Hint("Should return cluster info.").
Run()
// Nested fields
do.GET(Node("n1"), "/log").
JSON("entries.0.term", Is("1")).
JSON("entries.1.index", Is("2")).
Run()
// Null check
do.GET(Node("n1"), "/cluster/info").
JSON("leader", IsNull[string]()).
Hint("Leader should be null when no leader elected.").
Run()

Validates a response header:

do.PUT(Node("n2"), "/kv/foo", "value").
Status(Is(307)).
Header("Location", Matches(`^http://10\.0\.42\.\d+:\d+/kv/foo$`)).
Hint("Followers should redirect writes to the leader.").
Run()

Sets the help text shown when the check fails. Write it as a concrete next step:

do.PUT(Node("n1"), "/kv/", "value").
Status(Is(400)).
Body(Matches("^key cannot be empty\n?$")).
Hint("Your server should reject empty keys.\n" +
"Add validation to return 400 Bad Request for empty keys.").
Run()

Executes the check. Panics on failure, which is caught by the test runner and reported as a test failure.

By default checks execute once immediately. Use .Eventually() or .Consistently() to change this.

Retries until the check passes or the timeout expires (default 5s):

// Wait for replica to sync
do.GET(Node("n2"), "/kv/key").
Eventually().
Status(Is(200)).
Body(Is("value")).
Hint("Replica should eventually receive replicated data.").
Run()
// Custom timeout
do.GET(Node("n2"), "/kv/key").
Eventually(10 * time.Second).
Status(Is(200)).
Body(Is("value")).
Hint("Replica should sync within 10 seconds.").
Run()

Verifies the check stays true for the entire duration (default 5s):

// Verify no leader is elected in minority partition
do.GET(do.AllNodes("n1", "n2"), "/cluster/info").
Consistently(5 * time.Second).
JSON("leader", IsNull[string]()).
Hint("Minority partition should not elect a leader.").
Run()

Exact equality:

Status(Is(200))
Body(Matches("^key not found\n?$"))

Ordered comparisons. Work on any comparable type, most commonly strings from JSON fields:

JSON("term", GreaterThan("1"))
JSON("index", LessThan("100"))

Checks that a JSON field is null. Requires a type parameter:

JSON("leader", IsNull[string]())

Checks that a string contains a substring:

Body(Contains("error"))

Regex matching:

Body(Matches(`^[0-9]+$`))

Validates that a value has a specific length. Works on strings, arrays, slices, maps, and channels:

JSON("entries", HasLen[string](3))
JSON("peers", HasLen[string](4))

Accepts any of the provided values. Useful when order is non-deterministic:

Body(OneOf("value1", "value2", "value3"))

Negates another matcher:

Status(Not(Is(500)))
Body(Not(Contains("panic")))

Pass multiple matchers to any modifier; all must pass:

Status(Is(200), Not(Is(404)))
Body(Contains("Hello"), Not(Contains("error")))

Starts a previously stopped or killed node:

do.Start("n1")

Nodes are started automatically at the beginning of a suite via WithCluster. Use Start only when you need to restart a node that was stopped or killed during a test.

Sends SIGTERM and waits for graceful exit:

do.Stop("n1")

Sends SIGKILL immediately:

do.Kill("n1")

Stops then starts a node. The optional signal controls how the node is stopped:

// Graceful restart (SIGTERM)
do.Restart("n1")
// Crash simulation (SIGKILL)
do.Restart("n1", syscall.SIGKILL)

Isolate groups of nodes from each other:

// Split a 5-node cluster into minority and majority partitions
do.Partition([]string{"n1", "n2"}, []string{"n3", "n4", "n5"})

Restores full connectivity on every node:

do.Heal()

Makes a one-off GET request to a named node and returns a *FetchResponse. Returns nil if the node is unreachable. Useful for reading state before a check:

info := do.Fetch("n1", "/cluster/info")
if info == nil {
panic("n1 is down")
}
term := info.JSON("term")
do.Kill("n1")
do.GET(do.AllNodes(), "/cluster/info").
Eventually(2*time.Second).
JSON("term", GreaterThan(term)).
Run()

Runs fn n times in parallel, passing each invocation a 1-based index:

do.Concurrently(100, func(i int) {
do.PUT(Node("n1"), fmt.Sprintf("/kv/key%d", i), fmt.Sprintf("value%d", i)).
Status(Is(200)).
Hint("Your server should handle concurrent PUT requests.").
Run()
})

Waits for all goroutines to complete. If any panic, the first panic is re-raised.

Creates a new test suite. Options configure the cluster and timeouts:

// Single-node suite
return New(WithCluster(1)).Test(...)
// Five-node cluster
return New(WithCluster(5)).Test(...)

Declares an n-node cluster named n1, n2, …, nN. Nodes are started before any tests run and stopped after.

How long to wait for a node’s GET /health to return 200 OK after starting (default 10s):

New(WithCluster(1), WithNodeStartTimeout(30*time.Second))

How long to wait for a node to exit after SIGTERM before sending SIGKILL (default 5s):

New(WithCluster(1), WithNodeShutdownTimeout(15*time.Second))

Default timeout for Eventually and Consistently (default 5s):

New(WithCluster(1), WithRetryTimeout(10*time.Second))

HTTP client timeout per request (default 5s):

New(WithCluster(1), WithRequestTimeout(2*time.Second))
challenges/
└── kvstore/
├── init.go # Challenge registration
├── http_api.go # Stage 1
├── persistence.go # Stage 2
└── crash_recovery.go # Stage 3

Each stage is a function returning *Suite:

package kvstore
import (
. "github.com/clstr-io/clstr/internal/attest"
)
func HTTPAPI() *Suite {
return New(WithCluster(1)).
// 1
Test("PUT Stores Values", func(do *Do) {
do.PUT(Node("n1"), "/kv/key", "value").
Status(Is(200)).
Hint("Your server should accept PUT requests.").
Run()
}).
// 2
Test("GET Returns Stored Values", func(do *Do) {
do.GET(Node("n1"), "/kv/key").
Status(Is(200)).
Body(Is("value")).
Hint("Your server should return stored values.").
Run()
})
}

Import attest with . for cleaner syntax. Number tests with comments to visually separate them.

Create init.go:

package kvstore
import "github.com/clstr-io/clstr/internal/registry"
func init() {
challenge := &registry.Challenge{
Name: "Distributed Key-Value Store",
Summary: `Build a distributed key-value store from scratch.`,
}
challenge.AddStage("http-api", "Store and Retrieve Data", HTTPAPI)
challenge.AddStage("persistence", "Data Survives SIGTERM", Persistence)
challenge.AddStage("crash-recovery", "Data Survives SIGKILL", CrashRecovery)
registry.RegisterChallenge("kv-store", challenge)
}

Import your challenge in challenges/challenges.go:

package challenges
import (
_ "github.com/clstr-io/clstr/challenges/kvstore"
)