Multi-Node Deployment

By default Temps runs everything on a single server. Multi-node mode lets you add worker nodes — additional servers that receive container deployments from the control plane — so you can distribute workloads across multiple machines.

Overview

Multi-node is useful when you:

Outgrow a single server — you need more CPU, memory, or disk than one machine provides
Want geographic distribution — deploy containers closer to your users by placing workers in different regions
Need workload isolation — run production and staging on separate physical machines

When no worker nodes are registered, Temps behaves exactly as before — all containers run locally on the control plane server.

Architecture

Control plane (your Temps server)

Runs the API, dashboard, proxy, and database
Accepts temps join registrations from workers
Schedules container deployments across nodes
Routes traffic to containers via private addresses

Worker nodes

Run the temps agent HTTP server on port 3100
Accept container lifecycle commands from the control plane
Send heartbeats every 30 seconds with capacity metrics
Have Docker installed for running containers

Traffic from the internet still enters through the control plane's reverse proxy (Pingora). The proxy routes requests to the correct container — whether it runs locally or on a remote worker — using the node's private address.

Prerequisites

On each worker node:

Docker installed and running
The temps binary available (same version as the control plane)
Network connectivity to the control plane (see direct vs WireGuard modes below)
Port 3100 accessible from the control plane (or a WireGuard tunnel)

On the control plane:

Temps server running with temps serve
The /api/internal/nodes/register endpoint reachable from workers
A join token generated from Settings > Worker Nodes (required for secure node registration)

Adding a worker node (direct mode)

Direct mode is the simplest option when nodes can reach each other over a private network (e.g., same VPC, same data center, or a VPN).

1. Generate a join token:

Go to Settings > Worker Nodes in the Temps dashboard and click Generate Join Token. The token is shown once — copy and save it securely. The control plane only stores a SHA-256 hash.

2. Install Temps on the worker machine:

curl -fsSL https://temps.sh/install.sh | bash

3. Join the cluster:

temps join <control-plane-url> <join-token> --private-address <worker-ip>

Name
control-plane-url
Type
string
Description
The URL of your Temps control plane, e.g. https://temps.example.com.
Name
join-token
Type
string
Description
The join token generated in step 1. Required when the control plane has a token configured. The token is hashed (SHA-256) before storage — the plaintext is never persisted on the control plane.
Name
--private-address
Type
string
Description
The IP address the control plane should use to reach this worker's containers. Typically a private/internal IP like 10.0.0.5 or 192.168.1.50.

The --private-address flag enables direct mode — no WireGuard tunnel is created. The control plane routes traffic directly to the worker's private address.

Optional flags:

temps join <url> <token> --private-address 10.0.0.5 \
  --name worker-eu-1 \
  --labels region=eu-west,gpu=true \
  --agent-address 0.0.0.0:3100

--name — A friendly name for this node (defaults to the hostname)
--labels — Key-value pairs for scheduling hints (e.g., region=us-east,tier=production)
--agent-address — The address the agent server listens on (default: 0.0.0.0:3100)

After joining, the node is registered and the agent config is saved to ~/.temps/agent.json. Start the agent separately:

temps agent

The agent reads its config from ~/.temps/agent.json and begins sending heartbeats to the control plane every 30 seconds.

Adding a worker node (WireGuard mode)

WireGuard mode creates an encrypted tunnel between the worker and the control plane. Use this when nodes are on different networks without direct connectivity.

temps join <control-plane-url> <join-token>

Without --private-address, the join command:

Generates a WireGuard keypair
Contacts the relay at api.temps.sh for key exchange
Sets up a WireGuard tunnel (wg0 interface) with a 10.100.0.x address
Registers with the control plane using the WireGuard address
Saves the agent config to ~/.temps/agent.json

Then start the agent:

temps agent

WireGuard mode requires the wireguard-tools package installed on both the control plane and the worker. On Ubuntu/Debian: apt install wireguard-tools. On macOS: brew install wireguard-tools.

Running the agent

After temps join completes, run the agent server:

temps agent

The agent loads its configuration from ~/.temps/agent.json (saved by temps join). CLI flags and environment variables override the saved config:

Variable / Flag	Default	Description
`TEMPS_AGENT_ADDRESS` / `--listen-address`	`0.0.0.0:3100`	Address the agent listens on
`TEMPS_AGENT_TOKEN` / `--token`	—	Bearer token for authenticating control plane requests
`TEMPS_NODE_NAME` / `--node-name`	—	Name for this node
`TEMPS_CONTROL_PLANE_URL` / `--control-plane-url`	—	URL of the control plane
`TEMPS_NODE_ID` / `--node-id`	—	Node ID assigned during registration

If no ~/.temps/agent.json exists and required fields are missing, the agent exits with an error suggesting you run temps join first.

The agent sends a heartbeat to the control plane every 30 seconds. If heartbeats stop (e.g., the agent is stopped or the machine goes down), the control plane marks the node as offline after 90 seconds and excludes it from scheduling.

The agent exposes these endpoints (all authenticated with the bearer token):

Endpoint	Method	Description
`/agent/containers/deploy`	POST	Deploy a container
`/agent/containers/{id}/stop`	POST	Stop a container
`/agent/containers/{id}`	DELETE	Remove a container
`/agent/containers/{id}/logs`	GET	Stream container logs
`/agent/containers/{id}/info`	GET	Get container info
`/agent/images/{name}/exists`	GET	Check if an image exists
`/agent/health`	GET	Health check with system metrics

Verifying nodes

From the dashboard:

Go to Settings > Worker Nodes to see all registered nodes, their status, and last heartbeat time.

From the API:

List all nodes

curl https://your-temps-instance/api/internal/nodes

Response

{
  "nodes": [
    {
      "id": 1,
      "name": "worker-eu-1",
      "address": "https://10.0.0.5:3100",
      "private_address": "10.0.0.5",
      "role": "worker",
      "status": "active",
      "labels": { "region": "eu-west" },
      "last_heartbeat": "2026-03-04T12:00:00Z"
    }
  ],
  "total": 1
}

Get a specific node

curl https://your-temps-instance/api/internal/nodes/1

A node is considered active if it has sent a heartbeat within the last 90 seconds. Nodes that miss heartbeats are automatically marked offline and excluded from scheduling.

How scheduling works

When you deploy an application with multiple replicas, the node scheduler distributes containers across available nodes using round-robin scheduling:

The scheduler queries all nodes with status = "active" and a heartbeat within the last 90 seconds
If no active worker nodes exist, all replicas run locally on the control plane
If active workers exist, replicas are distributed evenly across all nodes (including local) in round-robin order
Each container records which node it was assigned to in the database

The deployment job creates a RemoteNodeDeployer for each remote assignment, which communicates with the worker's agent API to deploy, stop, and manage containers.

The scheduler currently uses simple round-robin. Label-based scheduling (e.g., "only deploy to nodes with gpu=true") is planned but not yet implemented — labels are stored but not used for filtering.

Current limitations

Multi-node is functional but still evolving. Known limitations:

No label-based scheduling — labels are stored on nodes but the scheduler does not filter by them yet. All active nodes receive deployments equally.
No remote image transfer — the worker node must be able to pull the Docker image independently. If you are using a local registry on the control plane, the worker needs network access to it.
No log streaming from remote nodes — deployment logs from containers on worker nodes are not yet streamed back to the control plane dashboard in real-time.
No remote node removal from UI — nodes can be removed via the API (DELETE is not yet exposed) or by stopping the agent and waiting for the heartbeat to expire.
Round-robin only — no resource-aware scheduling, no affinity/anti-affinity rules, no bin-packing.

What to explore next

Scaling strategies Resource allocation Docker container management

Multi-Node Deployment

List all nodes

Response

Get a specific node

Ask anything