Multi-Node Deployment
By default Temps runs everything on a single server. Multi-node mode lets you add worker nodes — additional servers that receive container deployments from the control plane — so you can distribute workloads across multiple machines.
Overview
Multi-node is useful when you:
- Outgrow a single server — you need more CPU, memory, or disk than one machine provides
- Want geographic distribution — deploy containers closer to your users by placing workers in different regions
- Need workload isolation — run production and staging on separate physical machines
When no worker nodes are registered, Temps behaves exactly as before — all containers run locally on the control plane server.
Architecture
Control plane (your Temps server)
- Runs the API, dashboard, proxy, and database
- Accepts
temps joinregistrations from workers - Schedules container deployments across nodes
- Routes traffic to containers via private addresses
- Watches every node's heartbeat and capacity, and self-reports its own
Worker nodes
- Run the
temps agentHTTP server on port 3100 - Accept container lifecycle commands from the control plane
- Send heartbeats every 30 seconds with capacity metrics
- Have Docker installed for running containers
Traffic from the internet still enters through the control plane's reverse proxy (Pingora). The proxy routes requests to the correct container — whether it runs locally or on a remote worker — using the node's private address.
The control plane also appears in the node list as a node of its own (id 0, role control-plane), so containers scheduled locally are visible alongside worker nodes, and the control plane's own CPU/memory/disk are monitored with the same alert thresholds.
Prerequisites
On each worker node:
- Docker installed and running
- The
tempsbinary available (same version as the control plane) - Network connectivity to the control plane (see direct vs WireGuard modes below)
- Port 3100 accessible from the control plane (or a WireGuard tunnel)
On the control plane:
- Temps server running with
temps serve - The
/api/internal/nodes/registerendpoint reachable from workers - An enrollment token minted on the control plane (see Securing node enrollment). Enrollment tokens are short-lived and single-use by default — they replace the old shared join token.
- (Optional but recommended) mutual TLS enabled so every control-plane↔agent call is encrypted and mutually authenticated.
Adding a worker node (direct mode)
Add a worker node (direct mode)
- 1
On the control plane, mint a single-use enrollment token: POST /api/settings/enrollment-tokens with {"ttl_secs": 3600, "max_uses": 1}. Copy the returned token (it is shown once; the control plane only stores a SHA-256 hash).
Checkpoint: Save the token before leaving -- you cannot retrieve the plaintext again. It expires in 1 hour and is consumed by a single successful join.
- 2
On the worker machine, install Temps: curl -fsSL https://temps.sh/install.sh | bash
- 3
Join the cluster with the control plane URL and the enrollment token, passing --private-address <worker-ip> (e.g. 10.0.0.5) to enable direct mode so traffic routes straight to the worker's private address.
- 4
If mutual TLS is enabled on the control plane, also pass --ca-fingerprint <fp> (read it from GET /api/settings -> multi_node.cluster_ca_fingerprint) so the join verifies the cluster CA and refuses a man-in-the-middle.
- 5
Optionally add --name worker-eu-1 and --labels region=eu-west,gpu=true to label the node (the default agent listen address is 127.0.0.1:3100; pass --agent-address 0.0.0.0:3100 to listen on all interfaces).
- 6
After joining, the agent config is saved to ~/.temps/agent.json (and the node cert/key + cluster CA if mTLS is on). Start the agent with temps agent.
Checkpoint: Open Settings > Worker Nodes and confirm the new node appears with status active and a recent heartbeat (within the last 90 seconds).
Direct mode is the simplest option when nodes can reach each other over a private network (e.g., same VPC, same data center, or a VPN).
1. Mint an enrollment token:
On the control plane, mint a short-lived, single-use token (see Securing node enrollment for all options):
Mint a single-use enrollment token
curl -X POST https://temps.example.com/api/settings/enrollment-tokens \
-H 'Content-Type: application/json' \
-d '{"ttl_secs": 3600, "max_uses": 1}'
The response contains the plaintext token once — copy and save it. The control plane only stores a SHA-256 hash.
2. Install Temps on the worker machine:
curl -fsSL https://temps.sh/install.sh | bash
3. Join the cluster:
temps join <control-plane-url> <enrollment-token> --private-address <worker-ip>
- Name
control-plane-url- Type
- string
- Description
The URL of your Temps control plane, e.g.
https://temps.example.com.
- Name
enrollment-token- Type
- string
- Description
The enrollment token minted in step 1 (or set
TEMPS_JOIN_TOKEN). The token is hashed (SHA-256) before storage — the plaintext is never persisted on the control plane — and is consumed by a single successful join.
- Name
--private-address- Type
- string
- Description
The IP address the control plane should use to reach this worker's containers. Typically a private/internal IP like
10.0.0.5or192.168.1.50.
The --private-address flag enables direct mode — no WireGuard tunnel is created. The control plane routes traffic directly to the worker's private address.
Optional flags:
temps join <url> <token> --private-address 10.0.0.5 \
--name worker-eu-1 \
--labels region=eu-west,gpu=true \
--agent-address 0.0.0.0:3100 \
--ca-fingerprint <cluster-ca-fingerprint>
--name— A friendly name for this node (defaults to the hostname)--labels— Key-value pairs stored on the node (e.g.,region=us-east,tier=production)--agent-address— The address the agent server listens on (default:127.0.0.1:3100)--ca-fingerprint— Pin the cluster CA when mutual TLS is enabled; the join aborts on mismatch
After joining, the node is registered and the agent config is saved to ~/.temps/agent.json. Start the agent separately:
temps agent
The agent reads its config from ~/.temps/agent.json and begins sending heartbeats to the control plane every 30 seconds.
Adding a worker node (WireGuard mode)
Add a worker node (WireGuard mode)
- 1
Mint an enrollment token on the control plane (POST /api/settings/enrollment-tokens) and copy it.
- 2
On the worker, run temps join <control-plane-url> <enrollment-token> without --private-address; this generates a WireGuard keypair, contacts the relay at api.temps.sh for key exchange, brings up the wg0 interface on a 10.100.0.x address, and registers using that address. WireGuard runs in-process (boringtun) -- no wireguard-tools package needed, just root or CAP_NET_ADMIN to create the wg0 interface.
- 3
Start the agent with temps agent. The config was saved to ~/.temps/agent.json during join.
Checkpoint: Confirm the node shows status active in Settings > Worker Nodes with its 10.100.0.x WireGuard address.
WireGuard mode creates an encrypted tunnel between the worker and the control plane. Use this when nodes are on different networks without direct connectivity.
temps join <control-plane-url> <enrollment-token>
Without --private-address, the join command:
- Generates a WireGuard keypair
- Contacts the relay at
api.temps.shfor key exchange - Sets up a WireGuard tunnel (
wg0interface) with a10.100.0.xaddress - Registers with the control plane using the WireGuard address
- Saves the agent config to
~/.temps/agent.json
Then start the agent:
temps agent
WireGuard runs in-process via an embedded userspace implementation (boringtun) — no wireguard-tools package, wg, or wg-quick binaries are required on either machine. The only requirement is permission to create the wg0 interface: run temps join as root or grant the binary CAP_NET_ADMIN.
Running the agent
Run the worker agent
- 1
Make sure temps join has already run on this machine so ~/.temps/agent.json exists with the node ID, control plane URL, and bearer token.
- 2
Run temps agent. To override the saved config, set TEMPS_AGENT_ADDRESS/--listen-address (default 127.0.0.1:3100), TEMPS_AGENT_TOKEN/--token, or TEMPS_CONTROL_PLANE_URL/--control-plane-url.
- 3
The agent sends a heartbeat every 30 seconds; if heartbeats stop the control plane marks the node offline after 90 seconds and excludes it from scheduling.
Checkpoint: In Settings > Worker Nodes confirm the node's last heartbeat updates and status stays active while the agent is running.
After temps join completes, run the agent server:
temps agent
The agent loads its configuration from ~/.temps/agent.json (saved by temps join). CLI flags and environment variables override the saved config:
| Variable / Flag | Default | Description |
|---|---|---|
TEMPS_AGENT_ADDRESS / --listen-address | 127.0.0.1:3100 | Address the agent listens on |
TEMPS_AGENT_TOKEN / --token | — | Bearer token for authenticating control plane requests |
TEMPS_NODE_NAME / --node-name | — | Name for this node |
TEMPS_CONTROL_PLANE_URL / --control-plane-url | — | URL of the control plane |
TEMPS_NODE_ID / --node-id | — | Node ID assigned during registration |
If no ~/.temps/agent.json exists and required fields are missing, the agent exits with an error suggesting you run temps join first.
The agent sends a heartbeat to the control plane every 30 seconds. If heartbeats stop (e.g., the agent is stopped or the machine goes down), the control plane marks the node as offline after 90 seconds and excludes it from scheduling.
When mutual TLS is enabled and the node holds a signed certificate, the agent serves HTTPS with client-certificate verification instead of plaintext — the control plane must present its cluster-CA-signed identity on every call.
Securing node enrollment
Joining a node is gated by an enrollment token. Tokens are short-lived and single-use by default, and only their SHA-256 hash is ever stored on the control plane — the plaintext is shown once at mint time and never again.
Mint a token
POST /api/settings/enrollment-tokens
curl -X POST https://temps.example.com/api/settings/enrollment-tokens \
-H 'Content-Type: application/json' \
-d '{"ttl_secs": 3600, "max_uses": 1, "bound_node_name": "worker-eu-1"}'
| Field | Default | Notes |
|---|---|---|
ttl_secs | 3600 (1 h) | Lifetime in seconds. Server-capped at 86400 (24 h). |
max_uses | 1 | How many successful joins the token allows. Range 1–100. |
bound_node_name | — | Optional. Restricts the token to a single node name (exact match). |
The response returns the plaintext token once, its expires_at, and — if mTLS is already set up — the cluster ca_fingerprint to pin on join.
List and revoke
# List active (non-revoked, non-expired) tokens, newest first
curl https://temps.example.com/api/settings/enrollment-tokens
# Revoke a token by id
curl -X DELETE https://temps.example.com/api/settings/enrollment-tokens/<id>
The old shared join token still works during an upgrade — it is gated by multi_node.legacy_shared_token_enabled, which defaults to true on existing clusters and false on fresh installs. Once every worker has re-enrolled with a per-node token, set it to false so only enrollment tokens are accepted. Expired, revoked, or exhausted enrollment tokens are rejected outright (no silent fallback to the legacy token), which prevents downgrade attacks.
Mutual TLS (mTLS)
By default the control plane and agents authenticate with a bearer token over the network you provide (a private VPC or the WireGuard tunnel). For defense in depth you can enable mutual TLS so every control-plane↔agent call is encrypted and both sides verify each other's certificate.
How it works
- Set
multi_node.require_mtlstotrue(it isfalseby default — existing clusters are unchanged until you flip it). - The control plane mints a per-cluster CA on the first certificate-bearing registration. The CA private key is encrypted at rest (AES-256-GCM) and never leaves the control plane.
- Each joining node generates its own key pair and CSR locally — the node's private key is never transmitted. The control plane signs a per-node leaf certificate, overwriting any SANs the node requested with server-authoritative ones (the node's registered IP and name).
- On success the node serves mutual TLS, and the control plane presents its own cluster-CA-signed client identity on every call. The node's address is automatically switched to
https://.
Pin the CA fingerprint on join
Read the fingerprint from the control plane and pass it to temps join so a malicious relay can't substitute its own CA:
# Operator: read the public cluster CA fingerprint
curl https://temps.example.com/api/settings | jq -r '.multi_node.cluster_ca_fingerprint'
# Worker: verify it during join
temps join https://temps.example.com <enrollment-token> \
--private-address 10.0.0.5 \
--ca-fingerprint <fingerprint>
A successful secure join prints Cluster CA fingerprint verified. and mTLS certificate provisioned — the agent will serve TLS.
Roll it out observe-then-enforce. Keep require_mtls = false while workers re-enroll and obtain certificates, then flip it to true once every active node is serving TLS. The cluster CA fingerprint is public and safe to share out-of-band; the CA private key is not — it is only ever stored encrypted.
Monitoring & alerts
The control plane watches every node automatically and delivers alerts through your existing notification providers (email, Slack, webhook) — there are no per-node rules to configure.
- Node offline (critical). If a node misses heartbeats for 90 seconds, it is marked offline, its workloads are failed over to healthy nodes, and a critical alert fires once per outage.
- Node recovered (info). When an offline node starts heartbeating again, an info notification is sent.
- Resource pressure (warning). When a node's CPU, memory, or disk usage rises above a threshold, a warning alert fires. The control plane itself (node
0) is checked with the same thresholds.
Thresholds live under multi_node and are operator-configurable with PATCH /api/settings:
| Setting | Default | Notes |
|---|---|---|
multi_node.node_cpu_alert_percent | 90 | Set to null to disable CPU alerts |
multi_node.node_memory_alert_percent | 90 | Set to null to disable memory alerts |
multi_node.node_disk_alert_percent | 90 | Root mount (/); set to null to disable |
A breach is strictly greater than the threshold — a node sitting at exactly 90% does not alert. The 90-second offline threshold is fixed. Notification delivery is best-effort: a provider failure is logged but never stalls health checks or failover.
Verifying nodes
From the dashboard:
Go to Settings > Worker Nodes to see all registered nodes, their status, and last heartbeat time.
From the API:
List all nodes
curl https://your-temps-instance/api/internal/nodes
Response
{
"nodes": [
{
"id": 0,
"name": "control-plane",
"address": "local",
"private_address": "127.0.0.1",
"role": "control-plane",
"status": "active",
"capacity": { "cpu_percent": 5.5 },
"last_heartbeat": "2026-03-04T12:00:00Z"
},
{
"id": 1,
"name": "worker-eu-1",
"address": "https://10.0.0.5:3100",
"private_address": "10.0.0.5",
"role": "worker",
"status": "active",
"labels": { "region": "eu-west" },
"last_heartbeat": "2026-03-04T12:00:00Z"
}
],
"total": 2
}
Get a specific node
curl https://your-temps-instance/api/internal/nodes/1
The control plane is always present as node 0 with role control-plane and self-reported capacity. Worker nodes start at id 1. A worker is considered active if it has sent a heartbeat within the last 90 seconds; workers that miss heartbeats are automatically marked offline and excluded from scheduling.
Node identity in containers
Every deployed container — on a worker or locally on the control plane — receives three environment variables so your app can report which node and replica is serving a request:
| Variable | Example | Meaning |
|---|---|---|
TEMPS_NODE_NAME | worker-eu-1 (or control-plane) | Name of the node running this container |
TEMPS_NODE_ID | 1 (or 0 for the control plane) | Numeric node ID, as a string |
TEMPS_REPLICA | 1, 2, 3, … | 1-based replica index within the deployment |
// Example: surface the serving node in a health endpoint
app.get('/whoami', (req, res) =>
res.json({
node: process.env.TEMPS_NODE_NAME,
nodeId: process.env.TEMPS_NODE_ID,
replica: process.env.TEMPS_REPLICA,
})
)
Remote-node logs
Container logs from remote worker nodes are streamed back to the control plane and land in the same searchable history as local logs — both the live tail and the history viewer now include remote containers, each line tagged with the node it ran on.
A control-plane collector keeps a stream open to each remote container over the agent channel (encrypted with mTLS when enabled) and writes the lines into the shared log store. In the log viewer you can filter by Container and by Node (both default to "all", interleaved across every container) and a per-line source column shows which container and worker node each line came from.
Logs & debuggingHow scheduling works
When you deploy an application with multiple replicas, the node scheduler distributes containers across available nodes using round-robin scheduling:
- The scheduler queries all nodes with
status = "active"and a heartbeat within the last 90 seconds - If no active worker nodes exist, all replicas run locally on the control plane
- If active workers exist, replicas are distributed evenly across all nodes (including local) in round-robin order
- Each container records which node it was assigned to in the database
The deployment job creates a RemoteNodeDeployer for each remote assignment, which communicates with the worker's agent API to deploy, stop, and manage containers.
The scheduler currently uses simple round-robin. Label-based scheduling (e.g., "only deploy to nodes with gpu=true") is planned but not yet implemented — labels are stored but not used for filtering.
Current limitations
Multi-node is functional but still evolving. Known limitations:
- No label-based scheduling — labels are stored on nodes but the scheduler does not filter by them yet. All active nodes receive deployments equally.
- No remote image transfer — the worker node must be able to pull the Docker image independently. If you are using a local registry on the control plane, the worker needs network access to it.
- No node removal from the dashboard — a node can be removed by stopping its agent and letting the heartbeat expire; a UI/
DELETEaction is not yet exposed. - Round-robin only — no resource-aware scheduling, no affinity/anti-affinity rules, no bin-packing.