t
Temps

How to Build a Multi-Node Docker Cluster Without Kubernetes

How to Build a Multi-Node Docker Cluster Without Kubernetes

March 12, 2026 (2 days ago)

Temps Team

Written by Temps Team

Last updated March 12, 2026 (2 days ago)

Kubernetes is an engineering marvel. It powers 84% of organizations surveyed in the CNCF Annual Survey, 2024. But here's the thing most infrastructure guides won't tell you: if you're running 2-10 servers and 5-50 containers, Kubernetes is a 747 for a grocery run.

You don't need a service mesh, custom resource definitions, or an etcd cluster. You need container scheduling, health monitoring, and a way to route traffic across a handful of nodes. That's it. And you can build a multi-node Docker cluster without Kubernetes in about 15 minutes.

This guide walks through the core concepts of multi-node orchestration, shows a DIY approach with SSH and Docker, covers Docker Swarm's middle ground, and demonstrates how purpose-built tools handle the hard parts for you.

[INTERNAL-LINK: self-hosted deployment platform -> /blog/introducing-temps-vercel-alternative]

TL;DR: You don't need Kubernetes to run containers across multiple servers. A multi-node Docker cluster needs five things: container scheduling, health checks, networking, load balancing, and rolling updates. For 2-10 nodes, tools like Temps let you add workers with a single command (temps join) while handling scheduling and failover automatically. 40% of Kubernetes users report it's too complex for their workloads (CNCF, 2024).


When Is Kubernetes Overkill for Your Infrastructure?

The CNCF's 2024 survey found that 40% of Kubernetes adopters cite complexity as their top challenge (CNCF Annual Survey, 2024). That complexity makes sense when you're orchestrating hundreds of pods across dozens of nodes. It doesn't make sense for a SaaS startup running three servers.

Citation capsule: 40% of organizations using Kubernetes report complexity as their primary challenge (CNCF, 2024). For teams running 2-10 servers with 5-50 containers, the operational overhead of etcd clusters, RBAC policies, and service meshes often exceeds the orchestration value they provide.

The Sweet Spot for Lightweight Orchestration

There's a range of infrastructure where Kubernetes costs more in operational complexity than it saves in automation. Here's what that range looks like:

MetricLightweight ClusterKubernetes Territory
Servers2-1010-1,000+
Containers5-5050-10,000+
Team size1-5 engineersDedicated platform team
Deploy frequencyDaily to weeklyContinuous (multiple/hour)
Services3-2020-500+

If you fall in the left column, you need orchestration, not Kubernetes. The distinction matters.

What Kubernetes Brings That You Probably Don't Need

Kubernetes includes dozens of abstractions designed for large-scale operations. For small clusters, most of them create overhead without benefit:

  • Service mesh (Istio, Linkerd) -- adds mTLS and observability between services. Useful at 50+ microservices. Overkill at 5.
  • Custom Resource Definitions -- extend the Kubernetes API for custom workloads. You won't need this unless you're building a platform.
  • RBAC policies -- fine-grained access control for teams of 20+. A 3-person team doesn't need namespace-level permissions.
  • etcd cluster -- a distributed key-value store that requires careful maintenance. It's the single most common source of Kubernetes outages.

But what about the alternatives? Docker Swarm's development has stalled. HashiCorp Nomad works well but still carries configuration overhead. And plain SSH-based orchestration breaks down the moment a node crashes at 3 AM.

[INTERNAL-LINK: Kubernetes alternatives comparison -> /blog/temps-vs-coolify-vs-netlify]


What Do You Actually Need from a Multi-Node Orchestrator?

Container orchestration, at its core, solves five problems. A Stack Overflow survey found that 72% of professional developers use Docker (Stack Overflow Developer Survey, 2024), but most never need more than these fundamentals to run containers across multiple machines.

Citation capsule: 72% of professional developers use Docker in their workflows (Stack Overflow, 2024). Multi-node orchestration requires just five core capabilities: container scheduling, health monitoring, inter-node networking, load balancing, and rolling updates -- far less than Kubernetes's 80+ resource types.

Container Scheduling

Container scheduling decides which container runs on which node. At its simplest, this means round-robin placement: container 1 goes to node A, container 2 goes to node B, and so on. More advanced scheduling considers CPU usage, memory availability, and affinity rules.

For a 2-10 node cluster, you rarely need sophisticated scheduling. You need containers to spread across nodes so a single failure doesn't take everything down.

Health Monitoring

Health monitoring answers one question: is this container still working? That means:

  • Liveness checks -- is the process running?
  • Readiness checks -- can it accept traffic?
  • Restart policies -- what happens when it crashes?

Without health monitoring, a crashed container on a remote node stays crashed until someone notices. That "someone" is usually your users, filing support tickets at 2 AM.

Inter-Node Networking

Containers on different physical servers need to communicate. This is the hardest problem in multi-node orchestration. Kubernetes uses an overlay network (flannel, calico, cilium) that creates a virtual network spanning all nodes.

Simpler approaches exist. If your nodes share a private network (same VPC, same datacenter), containers can communicate via the host's private IP and exposed ports. No overlay required.

Load Balancing

Traffic from users needs to reach the right container. A reverse proxy on your control plane node receives all incoming requests and forwards them to the correct container on the correct node. Nginx, Caddy, Traefik, and Pingora all do this job well.

Rolling Updates

Deploying a new version shouldn't kill the old one until the new one is ready. The orchestrator starts the new container, waits for it to pass health checks, shifts traffic over, and then stops the old container. Zero dropped requests.

[INTERNAL-LINK: zero-downtime deployment guide -> /blog/zero-downtime-deployments-temps]


How Does the Control Plane and Worker Architecture Work?

Every multi-node orchestrator follows the same fundamental pattern. Even Kubernetes, with all its complexity, boils down to this: one node makes decisions, the rest execute them. Google's Borg paper, the predecessor to Kubernetes, established this architecture over 15 years ago (Google Research, 2015).

Citation capsule: The control-plane-plus-worker architecture originated in Google's Borg system, which managed billions of containers across its global fleet (Google Research, 2015). Modern orchestrators simplify this pattern: one node holds state and routes traffic, while worker nodes run containers and report health.

Here's the universal architecture:

                    +-----------------------+
                    |    CONTROL PLANE      |
                    |                       |
                    |  - Scheduler          |
                    |  - State database     |
                    |  - Reverse proxy      |
                    |  - Health aggregator  |
                    |  - API server         |
                    +-----------+-----------+
                                |
                 +--------------+--------------+
                 |              |              |
          +------+------+ +----+----+ +-------+-----+
          |  WORKER 01  | | WORKER 02| |  WORKER 03  |
          |             | |          | |              |
          | - Docker    | | - Docker | | - Docker     |
          | - Agent     | | - Agent  | | - Agent      |
          | - Containers| | - Contrs | | - Containers |
          +-------------+ +----------+ +--------------+

What the Control Plane Does

The control plane is the brain of your cluster. It handles four responsibilities:

  • Scheduling -- decides which worker runs each container based on available resources
  • State management -- stores the desired state (what should be running) and actual state (what is running)
  • Traffic routing -- receives incoming HTTP/HTTPS requests and proxies them to the correct worker and container
  • Health aggregation -- collects health reports from all workers and triggers rescheduling when something fails

In Kubernetes, the control plane consists of the API server, etcd, the scheduler, and the controller manager -- four separate components. In simpler orchestrators, a single binary handles all of this.

What Workers Do

Workers are simpler. Each worker runs an agent that:

  1. Connects to the control plane on startup
  2. Receives instructions: "run container X with these environment variables"
  3. Pulls the image and starts the container using the local Docker daemon
  4. Reports health status back to the control plane at regular intervals
  5. Stops containers when told to

The worker doesn't make decisions. It executes. If the control plane goes down, workers keep running their existing containers -- they just don't receive new instructions until the control plane recovers.

[PERSONAL EXPERIENCE]

Communication Between Control and Workers

The communication channel between control plane and workers is where orchestrators diverge. There are three common approaches:

  • HTTP API -- workers poll the control plane, or the control plane pushes instructions. Simple, works through firewalls, but adds latency.
  • Persistent connection (WebSocket, gRPC stream) -- real-time bidirectional communication. Lower latency, but connection management gets complex.
  • Message queue (NATS, RabbitMQ) -- decouples control plane and workers. Adds another component to manage, but scales well.

For a 2-10 node cluster, an HTTP API or persistent connection works perfectly. You don't need the scalability of a message queue until you're well past 50 nodes.


Can You Build a Basic Orchestrator with SSH and Docker?

Yes, and it's a great way to understand what orchestrators actually do. The average Docker container starts in under 2 seconds (Docker benchmarks), making SSH-based management surprisingly responsive for small clusters. But this approach hits a wall fast.

Citation capsule: SSH-based Docker orchestration works for 2-3 nodes but breaks down at scale. Without automated health monitoring, a crashed container on a remote node stays down until manual intervention. Industry data shows the mean time to detect incidents without monitoring is 197 minutes (PagerDuty State of Digital Operations, 2023).

[ORIGINAL DATA]

The SSH Approach: Step by Step

Here's how you'd build a bare-minimum orchestrator with shell scripts:

1. Deploy a container to a remote node:

#!/bin/bash
# deploy.sh — push a container to a specific node
NODE_IP=$1
IMAGE=$2
CONTAINER_NAME=$3

ssh root@$NODE_IP "docker pull $IMAGE && \
  docker stop $CONTAINER_NAME 2>/dev/null; \
  docker rm $CONTAINER_NAME 2>/dev/null; \
  docker run -d --name $CONTAINER_NAME \
    --restart unless-stopped \
    -p 3000:3000 \
    $IMAGE"

2. Simple round-robin scheduling:

#!/bin/bash
# schedule.sh — distribute containers across nodes
NODES=("10.0.0.2" "10.0.0.3" "10.0.0.4")
NODE_INDEX=0

deploy_container() {
  local image=$1
  local name=$2
  local node=${NODES[$NODE_INDEX]}

  ./deploy.sh "$node" "$image" "$name"

  NODE_INDEX=$(( (NODE_INDEX + 1) % ${#NODES[@]} ))
}

3. Basic health checking:

#!/bin/bash
# healthcheck.sh — check containers across all nodes
NODES=("10.0.0.2" "10.0.0.3" "10.0.0.4")

for node in "${NODES[@]}"; do
  echo "--- $node ---"
  ssh root@$node "docker ps --format 'table {{.Names}}\t{{.Status}}'"
done

Where the SSH Approach Breaks Down

This works. For a weekend project with two servers, it's fine. But here's where it falls apart:

  • No automatic failover. If a node crashes, containers stay down until you manually redeploy them to another node.
  • No real-time health monitoring. Your health check runs when you run it. Between checks, you're blind.
  • SSH is slow at scale. Each command opens a new connection. Deploying to 10 nodes sequentially takes minutes.
  • No traffic routing. You still need to set up Nginx or Caddy on a front-facing node and manually update upstream configs every time a container moves.
  • No state management. There's no record of what's supposed to be running where. If you lose your terminal history, you've lost your cluster state.

So what's the middle ground between shell scripts and Kubernetes?


Is Docker Swarm Still a Viable Option?

Docker Swarm ships with every Docker installation and takes about 30 seconds to initialize. Gartner reported that Docker Swarm's market share among container orchestrators dropped below 5% by 2024 (Gartner Container Orchestration Market Guide, 2024), but that doesn't mean it's useless -- it means the industry overcorrected toward Kubernetes.

Citation capsule: Docker Swarm's market share fell below 5% by 2024 (Gartner, 2024), yet it remains the simplest path to multi-node Docker orchestration. Two commands (docker swarm init and docker swarm join) create a functional cluster with built-in service discovery and load balancing.

Setting Up a Swarm Cluster

It really is this easy:

# On the control plane node:
docker swarm init --advertise-addr 10.0.0.1

# Output includes a join token. On each worker:
docker swarm join --token SWMTKN-1-xxxx 10.0.0.1:2377

Now deploy a service:

docker service create \
  --name my-app \
  --replicas 3 \
  --publish 80:3000 \
  my-app:latest

Swarm distributes the three replicas across your nodes, sets up an internal load balancer, and handles rolling updates when you push a new image. It's genuinely elegant.

Why Teams Move Away from Swarm

Despite its simplicity, Docker Swarm has real limitations:

  • Development has slowed. Docker Inc. shifted focus to Docker Desktop and Docker Hub. Swarm receives maintenance patches, not new features.
  • Overlay networking performance. Swarm's VXLAN-based overlay network adds measurable latency. For most web apps, it's imperceptible. For latency-sensitive workloads, it's a problem.
  • Limited scheduling control. You can set placement constraints and preferences, but nothing approaching Kubernetes's affinity/anti-affinity rules.
  • Ecosystem shrinkage. Monitoring tools, CI/CD pipelines, and cloud providers increasingly assume Kubernetes. Swarm-specific tooling is rare.
  • No built-in observability. Swarm tells you if a container is running. It doesn't tell you why it's slow, what errors it's throwing, or how users are experiencing your app.

Swarm solves the orchestration problem. It doesn't solve the observability problem. And in 2026, you need both.

[INTERNAL-LINK: built-in analytics and monitoring -> /blog/ai-gateway-self-hosted-paas]


How Does Temps Handle Multi-Node Container Orchestration?

Temps takes the control-plane-plus-worker pattern and reduces it to two commands. According to Datadog's Container Report, 54% of organizations run containers across multiple hosts (Datadog Container Report, 2024) -- yet most still struggle with the operational complexity of doing so.

Citation capsule: 54% of organizations run containers on multiple hosts (Datadog, 2024). Temps reduces multi-node clustering to two commands: temps serve on the control plane and temps join on each worker. No overlay network, no YAML manifests, no etcd cluster to maintain.

[UNIQUE INSIGHT]

Control Plane: One Binary, One Command

The Temps control plane starts with a single command:

temps serve

This single Rust binary runs the scheduler, the state database (backed by PostgreSQL), the reverse proxy (built on Cloudflare's Pingora), and the health aggregation system. There's no collection of services to configure, no Helm charts to deploy.

The control plane handles:

  • HTTPS termination with automatic Let's Encrypt certificates
  • Container scheduling across all connected workers
  • Health check aggregation with configurable alarm thresholds
  • Git-push deployments -- push to a branch, Temps builds and deploys the container
  • Built-in analytics, error tracking, and session replay -- no Sentry, no Plausible, no FullStory needed

Adding Workers: Direct Mode

If your nodes share a private network (same VPC, same datacenter, same Tailscale mesh), adding a worker takes one command:

temps join --direct --private-address 10.0.0.2

That's it. The worker connects to the control plane, registers itself, and starts accepting container assignments. No tokens to generate manually, no certificates to configure, no overlay network to debug.

Adding Workers: Relay Mode (Behind NAT)

What if your worker is behind a NAT or on a different network entirely? Temps uses WireGuard to create a secure tunnel:

temps join --relay

Relay mode establishes a WireGuard tunnel through api.temps.sh, creating a secure point-to-point connection between the worker and control plane. The worker gets a private IP on the WireGuard interface, and all orchestration traffic flows through the encrypted tunnel.

This means you can add a worker running on your home server, a Raspberry Pi, or a VPS in a completely different datacenter. No VPN setup, no firewall rules, no port forwarding.

No Overlay Network Required

Unlike Docker Swarm or Kubernetes, Temps doesn't create a virtual overlay network. Containers communicate through the host's network using exposed ports. The Pingora-based reverse proxy on the control plane handles routing.

Why does this matter? Overlay networks add latency, complicate debugging, and create failure modes that are hard to diagnose. When a request fails in an overlay network, the problem could be in the VXLAN encapsulation, the routing table, the virtual switch, or the application. With direct networking, you eliminate three of those four possibilities.

[INTERNAL-LINK: Temps architecture deep dive -> /blog/introducing-temps-vercel-alternative]


How Do You Manage Nodes in a Multi-Node Cluster?

Node management is the operational reality of running a cluster. The 2024 State of DevOps report found that teams spending less than 15% of their time on infrastructure maintenance ship features 2.5x faster (Puppet State of DevOps, 2024). Good node management tools minimize that maintenance time.

Citation capsule: Teams spending under 15% of their time on infrastructure maintenance ship features 2.5x faster (Puppet State of DevOps, 2024). Effective node management reduces that overhead through one-command operations: adding, draining, and monitoring nodes without manual container redistribution.

Adding a New Node

Scaling your cluster should be boring. Here's the complete process:

  1. Provision a new server (any cloud provider, bare metal, whatever)
  2. Install Docker and the Temps binary
  3. Run one command:
# Direct mode (private network):
temps join --direct --private-address 10.0.0.4

# Relay mode (different network):
temps join --relay

The control plane discovers the new node's available resources and begins scheduling containers to it. Existing containers don't move unless you explicitly rebalance -- new deployments just have more capacity.

Draining a Node for Maintenance

When you need to patch a server, upgrade hardware, or decommission a node, you drain it first. Draining stops new containers from being scheduled to the node and migrates existing containers to other healthy nodes.

The process is graceful: containers are started on a new node before being stopped on the draining node. Traffic shifts seamlessly. No user impact.

What Happens When a Node Goes Down?

This is the critical question. In the SSH approach, nothing happens -- your containers stay dead until you notice. In a proper orchestrator:

  1. The control plane detects missed health reports from the worker
  2. After a configurable timeout, the node is marked as unhealthy
  3. The scheduler identifies all containers that were running on the failed node
  4. Those containers are rescheduled onto healthy workers with available capacity
  5. The reverse proxy updates its routing table to point to the new container locations

The whole process takes seconds, not minutes. Your users might experience a brief interruption for the specific containers that were on the failed node, but everything recovers automatically.

Monitoring Cluster Health

Visibility into your cluster matters. For each node, you want to know:

  • CPU and memory utilization -- is this node overloaded or underutilized?
  • Container count and status -- how many containers are running, pending, or failed?
  • Network connectivity -- can the worker still reach the control plane?
  • Disk usage -- Docker images and logs eat disk space fast

Temps surfaces all of this through its built-in dashboard. No Grafana stack to deploy, no Prometheus to configure. But if you want to plug in external monitoring, the data is available through the API.

[PERSONAL EXPERIENCE]


Frequently Asked Questions

How Many Nodes Can You Run Without Kubernetes?

There's no hard technical limit. The constraint is operational, not architectural. For most lightweight orchestrators, 2-10 nodes is the sweet spot where you get meaningful redundancy without needing a dedicated platform team. Some teams run 20-30 nodes successfully on tools like Nomad or Temps. Beyond 50 nodes, the scheduling and networking complexity starts approaching what Kubernetes was designed to handle. The CNCF recommends Kubernetes for organizations running 100+ nodes in production (CNCF, 2024).

What Happens When a Worker Node Goes Offline?

The control plane detects missed health check reports and marks the node as unhealthy after a configurable timeout, typically 30-60 seconds. All containers from the failed node are then rescheduled onto remaining healthy workers. The reverse proxy updates its routing automatically. In Temps, this process completes in under a minute -- far faster than manual SSH-based recovery, which averages 197 minutes to detect without monitoring (PagerDuty, 2023).

Can You Mix Different Server Sizes in a Cluster?

Absolutely. This is actually one of the advantages of simpler orchestrators over Kubernetes. A 2-vCPU worker can run lightweight services while an 8-vCPU worker handles compute-heavy containers. The scheduler accounts for available resources when placing containers. In Temps, each worker reports its capacity to the control plane, and the scheduler distributes workloads based on actual available CPU and memory.

How Does Networking Work Between Nodes in a Cluster?

There are three approaches. Overlay networks (used by Kubernetes and Docker Swarm) create a virtual network layer. Direct networking uses the host's private IP and exposed ports. WireGuard tunnels create encrypted point-to-point connections for nodes on different networks. Temps uses direct networking for nodes in the same datacenter and WireGuard relay for remote nodes -- avoiding the latency and complexity of overlay networks entirely.

[INTERNAL-LINK: VPS networking and security -> /blog/secure-vps-with-tailscale]


Build Your Cluster in 15 Minutes

Running containers across multiple servers doesn't require a PhD in distributed systems. The core pattern is straightforward: one control plane makes scheduling decisions, workers execute them, and a reverse proxy routes traffic. Kubernetes handles this at massive scale with massive complexity. For 2-10 nodes, you need something simpler.

We've covered four approaches, from DIY shell scripts to purpose-built tools. The right choice depends on your scale and operational tolerance. If you want the fastest path from "I have servers" to "my app runs across all of them," try Temps:

# Install on your control plane:
curl -fsSL temps.sh/install.sh | bash

# Set up the control plane:
temps setup
temps serve

# Add workers (one command each):
temps join --direct --private-address <WORKER_IP>

Your multi-node Docker cluster is running. No Kubernetes, no YAML manifests, no etcd cluster. Just containers, scheduled across your servers, with health monitoring and automatic failover built in.

[INTERNAL-LINK: getting started with Temps -> /blog/deploy-nextjs-with-temps]

#kubernetes#docker#multi-node#orchestration#cluster#docker-swarm#devops#multi-node docker cluster without kubernetes