March 12, 2026 (3mo ago)
Written by Temps Team
Last updated March 12, 2026 (3mo ago)
To add zero-downtime deployments to Docker in 2026, you need three things: a health check endpoint that gates traffic, connection draining that lets in-flight requests finish, and an atomic route switch from old container to new. Docker itself provides none of these — you have to layer them on top, either manually or with a platform that handles them for you.
This guide covers the manual approach with Docker Compose and Nginx, then explains how Temps handles all three automatically on every git push.
TL;DR: Docker's default stop-start cycle creates a 5–30 second window where requests fail. Eliminate it with health check gating, connection draining, and an atomic route switch. Kubernetes gives you this out of the box but costs $70–150/month in cluster overhead. Kamal gives you blue-green on bare metal. Temps gives you health-check-gated deployments with one-command rollback on a single Rust binary — self-hostable free under Apache 2.0, or ~$6/month on Temps Cloud.
Temps uses a health-check-gated deployment pipeline backed by a Pingora reverse proxy (Cloudflare's open-source proxy, written in Rust). On every git push, this sequence runs:
restart: always behaviorGET /) every 5 seconds. It requires 2 consecutive successful responses (HTTP 2xx, 3xx, 404, or 405 all count as healthy). If the container crashes or returns 5xx for 60 seconds straight, the deployment fails.If health checks fail, Temps calls rollback_to_deployment() automatically — it creates a new deployment record pointing to the previous verified image and runs it through the same health-check pipeline before switching traffic. The broken version never reaches production.
You configure the health check path per-project in .temps.yaml:
# .temps.yaml
healthcheck:
path: /api/health
timeout: 300 # seconds to wait for checks to pass
When path is omitted, Temps skips HTTP health checks entirely and promotes as soon as the container is running — useful for workers or services without an HTTP endpoint.
| Capability | Temps | Kubernetes | Kamal |
|---|---|---|---|
| Zero-downtime strategy | Health-check-gated stop-replace | Rolling update (readiness probes) | Blue-green container swap |
| Health checks | HTTP polling, configurable path, 2-success gate | Readiness + liveness probes, fully configurable | Docker HEALTHCHECK via deploy config |
| Auto-rollback | Yes — one-command or automatic on failure | Yes — rolls back to previous ReplicaSet | Manual (kamal rollback) |
| Self-hostable | Yes — free | Yes — significant ops overhead | Yes — runs on any server with Docker |
| Managed option | Temps Cloud (~$6/mo, Hetzner + 30%) | EKS/GKE/AKS ($70–150/mo control plane) | None (self-managed only) |
| Ease of setup | Single binary, git push | Cluster + manifests + ingress + cert-manager | Kamal config file + SSH access to server |
| Included observability | Analytics, session replay, error tracking, uptime | None built-in (add Prometheus, Grafana, etc.) | None built-in |
| License | Apache 2.0 | Apache 2.0 | MIT |
Kubernetes is the right choice when you're orchestrating dozens of services across multiple nodes. For one to five Docker apps on a single server, the control plane overhead ($70–150/month for managed Kubernetes) plus the operational complexity (manifests, ingress, cert-manager, RBAC) is rarely justified.
Kamal is purpose-built for bare-metal Docker deployments and does blue-green well. Its limitation is that it's deploy-focused only — no built-in observability, no analytics, no error tracking.
Temps ships one binary that replaces: Vercel (deployments), PostHog or Plausible (analytics), FullStory (session replay), Sentry (error tracking), Pingdom (uptime monitoring), managed databases, and transactional email. Self-hosted free, or ~$6/month on Temps Cloud (Hetzner VPS cost plus 30% margin, no per-seat fees, no bandwidth bills).
Docker's default lifecycle creates an unavoidable gap between stopping the old container and starting the new one.
When you run docker compose up -d --build, Docker stops the old container, removes it, builds the new image, and starts a fresh container. Three windows where requests fail:
During these gaps, any request hitting your server gets a 502 Bad Gateway or connection refused.
restart: always Doesn't HelpSetting restart: always in your docker-compose.yml tells Docker to restart the same container when it crashes. It doesn't spin up a new version alongside the old one.
# This does NOT give you zero-downtime deployment
services:
web:
image: myapp:latest
restart: always # Only restarts the SAME container on crash
What you need is two containers running simultaneously — the old version serving traffic while the new version boots and passes health checks.
Every zero-downtime strategy relies on three mechanisms: health check gating, connection draining, and atomic routing.
The new container should never receive traffic until it's genuinely ready. A health check endpoint verifies that your application booted, connected to its database, and can serve requests.
// Express.js health check that verifies real dependencies
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1');
await redis.ping();
res.status(200).json({ status: 'healthy' });
} catch (err) {
res.status(503).json({ status: 'unhealthy', error: err.message });
}
});
A health endpoint that blindly returns 200 defeats the purpose. If your app returns "healthy" before the database connection pool is established, the load balancer routes traffic to a container that immediately throws 500 errors.
Docker's built-in HEALTHCHECK instruction helps with container status, but it doesn't control your load balancer. You need your deployment tooling to check health before switching traffic.
# Dockerfile — HEALTHCHECK sets container status, not traffic routing
HEALTHCHECK --interval=5s --timeout=3s --start-period=10s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
When you remove the old container from the load balancer, don't kill it immediately. In-flight requests — a user mid-checkout, a file upload at 90%, a long-polling connection — need time to complete.
Connection draining means:
Without draining, you get sporadic 502 errors during every deploy — hard to reproduce and harder to diagnose.
The load balancer needs to flip from old to new in one step. For Nginx, this is a config reload:
nginx -s reload
Nginx's reload starts new worker processes with the updated config. Old worker processes finish their current requests before exiting. That's atomic routing and connection draining in one operation — but only for the Nginx layer. Your application containers still need their own SIGTERM handling.
Here's a working blue-green deployment pipeline for a single server using Docker Compose, Nginx, and a bash script.
Define two services — web-blue and web-green — so both can run simultaneously during the transition.
# docker-compose.yml
services:
web-blue:
build: .
container_name: app-blue
ports:
- "8001:3000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 5s
timeout: 3s
retries: 3
start_period: 10s
restart: unless-stopped
web-green:
build: .
container_name: app-green
ports:
- "8002:3000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 5s
timeout: 3s
retries: 3
start_period: 10s
restart: unless-stopped
Port 8001 maps to the blue container. Port 8002 maps to green. Nginx sits in front and routes to whichever is currently active.
# /etc/nginx/conf.d/app.conf
upstream app_backend {
include /etc/nginx/conf.d/active-upstream.conf;
}
server {
listen 80;
server_name myapp.com;
location / {
proxy_pass http://app_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
}
}
# /etc/nginx/conf.d/active-upstream.conf
server 127.0.0.1:8001; # Points to blue by default
#!/bin/bash
set -euo pipefail
HEALTH_ENDPOINT="http://localhost:PORT/health"
MAX_RETRIES=30
RETRY_INTERVAL=2
DRAIN_WAIT=10
UPSTREAM_CONF="/etc/nginx/conf.d/active-upstream.conf"
# Determine active slot
CURRENT=$(cat "$UPSTREAM_CONF" | grep -oP ':\K[0-9]+')
if [ "$CURRENT" = "8001" ]; then
ACTIVE="blue"; TARGET="green"; TARGET_PORT="8002"
else
ACTIVE="green"; TARGET="blue"; TARGET_PORT="8001"
fi
echo "Active: $ACTIVE | Deploying to: $TARGET (port $TARGET_PORT)"
# Build and start the target container
docker compose up -d --build "web-$TARGET"
# Wait for health check
HEALTH_URL="${HEALTH_ENDPOINT/PORT/$TARGET_PORT}"
RETRIES=0
until curl -sf "$HEALTH_URL" > /dev/null 2>&1; do
RETRIES=$((RETRIES + 1))
if [ "$RETRIES" -ge "$MAX_RETRIES" ]; then
echo "ERROR: Health check failed after $MAX_RETRIES attempts"
docker compose stop "web-$TARGET"
exit 1
fi
echo " Attempt $RETRIES/$MAX_RETRIES..."
sleep "$RETRY_INTERVAL"
done
# Switch Nginx to the new container
echo "server 127.0.0.1:$TARGET_PORT;" > "$UPSTREAM_CONF"
nginx -s reload
# Drain old container
sleep "$DRAIN_WAIT"
# Stop the old container
docker compose stop "web-$ACTIVE"
echo "Deploy complete! Active slot: $TARGET"
resolver directive and variables in proxy_pass to force re-resolution.process.on('SIGTERM'). Python needs signal trapping.docker image prune -f to your deploy script.Each of these burns you exactly once in production. Then the script grows to 150 lines and you have a deployment system to maintain indefinitely.
Database migrations are the trickiest part. If your new code expects a column that doesn't exist yet — or your old code breaks when a column disappears — your deployment fails mid-rollout.
Split every breaking schema change into three deploys:
Phase 1: Expand (backward-compatible)
-- Add the new column without removing the old one
ALTER TABLE users ADD COLUMN full_name TEXT;
UPDATE users SET full_name = first_name || ' ' || last_name;
Deploy code that writes to both columns but reads from the new one.
Phase 2: Migrate
Deploy code that only uses the new column. Both old and new application versions coexist safely because the old column still exists.
Phase 3: Contract (cleanup)
-- Safe to remove after all instances run the new version
ALTER TABLE users DROP COLUMN first_name;
ALTER TABLE users DROP COLUMN last_name;
Never drop a column in the same deploy that stops using it. During a blue-green or rolling deployment, both v1 and v2 run simultaneously. The expand-and-contract pattern ensures both versions work with the same schema at every step.
Kubernetes handles zero-downtime deployment well — rolling updates, readiness probes, and preStop hooks are built in. But for a single Docker app on one server, the overhead is significant.
What Kubernetes requires:
For one to five Docker apps on a single server, this is like hiring a crane to hang a picture frame. Kubernetes shines when you're running dozens of services across multiple nodes with dedicated infrastructure engineers.
Temps runs a Pingora-based reverse proxy (Cloudflare's open-source proxy) that handles health checks, connection draining, and atomic traffic switching out of the box.
Every git push triggers this pipeline automatically:
# That's the entire deploy workflow
git push temps main
If the new container fails health checks, Temps rolls back automatically: it deploys the previously verified image through the same health-check pipeline. The broken version never receives production traffic.
The sharp edges from the DIY section:
Temps is a single Rust binary. Self-host free under Apache 2.0, or run on Temps Cloud at ~$6/month (Hetzner VPS cost plus 30%, no per-seat fees, no bandwidth bills). That same binary also handles analytics, session replay, error tracking, uptime monitoring, managed databases, and transactional email — no SaaS subscriptions required.
Watch these for 15 minutes after each deployment:
A deploy that passes health checks can still have subtle issues: a slow query, a visual regression, an edge case in a new feature.
A health endpoint that always returns 200 defeats health check gating. Always verify real dependencies:
// Bad: always returns healthy
app.get('/health', () => ({ status: 'ok' }));
// Good: verifies actual readiness
app.get('/health', async () => {
await db.query('SELECT 1');
await cache.ping();
return { status: 'ok' };
});
Never drop a column in the same deploy that stops using it. Always use the expand-and-contract pattern across multiple deploys.
Small, frequent deploys are easier to roll back. The 2024 DORA report shows elite teams deploy multiple times per day with a 5% change failure rate, while low performers deploy monthly with a 64% failure rate.
A deploy that passes health checks can still have subtle issues: a visual regression, a slow database query, an edge case in a new feature.
If you can't roll back in seconds, you don't have zero-downtime deployment — you have zero-downtime deployment with a single point of failure. Always keep the previous image cached and test your rollback process regularly.
Don't assume your zero-downtime setup works — prove it with a load test.
heyhey is a lightweight HTTP load generator:
# Install hey
go install github.com/rakyll/hey@latest
# Terminal 1: start continuous load test
hey -z 120s -c 10 -q 50 http://myapp.com/health
This sends 50 requests per second from 10 concurrent workers for 120 seconds. In another terminal, trigger your deploy:
# Terminal 2: deploy while load test runs
./deploy.sh # DIY approach
# or
git push temps main # Temps approach
Status code distribution:
[200] 59982 responses
# Zero-downtime: CONFIRMED
# Any non-200 responses means requests were dropped.
If you see 502s at the switch point, increase DRAIN_WAIT. If you see 503s early in the deploy, your health endpoint passes too eagerly — check real dependencies. If you see timeout errors, the drain timeout is too short.
Deployment time depends primarily on Docker image build time and application boot time. A typical Node.js or Python app builds in 30–90 seconds with layer caching. Health check verification adds 10–20 seconds. The traffic switch itself is instantaneous. Total time from push to live: under two minutes for most Docker applications.
The old container keeps running and serving all traffic — users see nothing unusual. In the DIY approach, the deploy script stops the new container and exits with an error. With Temps, the platform rolls back automatically after health checks fail and sends a notification with container logs. Either way, the broken version never receives production traffic.
Yes. Blue-green deployment works on a single server by running two containers mapped to different ports, with Nginx routing to the active one. You need enough RAM and CPU for two instances during the brief overlap period — typically an extra 256–512 MB of RAM for 30–60 seconds. No cluster required.
Zero-downtime is the goal. Blue-green is one strategy for achieving it. Rolling deployment and canary deployment are alternatives. Blue-green maintains two full environments and switches traffic atomically. Rolling replaces instances one by one. Both achieve zero downtime with different resource and complexity trade-offs.
Yes, but you must use the expand-and-contract pattern. Never make breaking schema changes in a single deploy. Add new columns first, migrate code, then remove old columns in a separate deploy. This ensures both old and new application versions work with the same schema during the deployment window.
The only requirement is a health check endpoint — an HTTP route that returns 200 when your app is ready to serve traffic. Most web frameworks make this trivial. Your app should also handle SIGTERM for graceful shutdown. No other changes to your Dockerfile or application code are needed.
Zero-downtime deployment comes down to three principles: don't send traffic to unready containers, let in-flight requests finish, and switch routes atomically. You can implement these yourself with Docker Compose, Nginx, and a bash script. You'll spend a day building it and ongoing time maintaining it as edge cases surface.
Or skip the plumbing entirely. Temps wraps all three principles into a single binary — Pingora proxy, health check gating, connection draining, automatic rollback — so every git push produces a zero-downtime deploy without scripts to maintain.
The DIY approach teaches you exactly what's happening. The platform approach lets you stop thinking about it. The worst option is accepting 502 errors during deploys as normal.
For a comparison of Temps against Vercel, Fly.io, Render, Railway, Coolify, and Kamal on zero-downtime mechanism quality, see Best Platforms for Zero-Downtime Deployments in 2026.
# Install Temps and get zero-downtime deploys by default
curl -fsSL temps.sh/install.sh | bash