Scaling

Temps runs your application in Docker containers on your own server. Scaling means giving those containers more resources or running more of them. This guide covers when and how to scale, and what to optimize before adding hardware.

Looking for the step-by-step dashboard and API walkthrough for changing CPU, memory, and replica count? See Scale for Traffic. This page focuses on the strategy and trade-offs behind those settings.

Scaling options

Vertical scaling (bigger server)

More CPU cores and RAM
Simplest change — upgrade your VPS
No application changes needed
Best for database-heavy workloads

Horizontal scaling (more replicas)

Multiple container instances per environment
Built-in to Temps via replica configuration
Requires stateless application design
Best for CPU-bound web traffic

Multi-node scaling — if you need more capacity than a single server provides, you can add worker nodes to distribute containers across multiple machines. The control plane schedules replicas across all active nodes using round-robin.

Before scaling, check whether optimization can solve the problem for free. A slow database query or missing index can make an application feel overloaded when the server has plenty of capacity.

When to scale

Signs your application needs more resources:

Symptom	Likely bottleneck	First action
Response times increasing gradually	CPU or memory pressure	Check resource usage in dashboard
Occasional timeouts under load	Too few connections or CPU saturation	Add replicas or optimize queries
Out-of-memory errors	Container memory limit	Increase memory limit or optimize memory usage
Build times increasing	Build-time CPU/memory	Upgrade server for faster builds
Database queries slow	Database, not app server	Optimize queries, add indexes, connection pooling

Replicas

Temps supports running multiple container instances (replicas) for each environment. Traffic is distributed across all healthy replicas.

Configuring replicas

Set the replica count at the project or environment level:

Project-level (applies to all environments by default):

Go to your project > Settings > Deployment Config
Set Replicas to the desired number

Environment-level (overrides project default):

Go to your project > Environments > select environment > Settings
Set Replicas to the desired number

The environment setting takes priority over the project setting.

How replicas work

When you deploy with multiple replicas:

Temps creates N containers (e.g. myapp-1, myapp-2, myapp-3)
Each container gets its own host port
All containers run the same image with the same environment variables
The proxy distributes incoming requests across all healthy replicas
Health checks run independently per replica
If any replica fails during deployment, all replicas are cleaned up and the deployment fails

Requirements for replicas

Your application must be stateless — it cannot rely on local files, in-memory sessions, or container-local state that is not shared:

Sessions: Store in Redis or a database, not in memory
File uploads: Store in S3 or blob storage, not the local filesystem
Cache: Use Redis or an external cache, not in-process memory
WebSocket connections: Each client connects to one replica — use a pub/sub system (Redis) if clients need to communicate across replicas

Stateless session example

import session from 'express-session';
import RedisStore from 'connect-redis';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false,
}));

Resource limits

Each container runs with configurable CPU and memory limits:

Setting	Default	Description
CPU request	100m	Minimum guaranteed CPU (0.1 cores)
CPU limit	1000m	Maximum CPU (1 core)
Memory request	128Mi	Minimum guaranteed memory
Memory limit	512Mi	Maximum memory

Configure these per environment in Environment Settings:

Name
cpu_limit
Type
string
Description
Maximum CPU allocation. 1000m = 1 full CPU core. 500m = half a core.
Name
memory_limit
Type
string
Description
Maximum memory allocation. 512Mi = 512 MB. 1024Mi = 1 GB.
Name
cpu_request
Type
string
Description
Minimum guaranteed CPU. Used for scheduling decisions.
Name
memory_request
Type
string
Description
Minimum guaranteed memory.

If your container exceeds its memory limit, Docker kills it and Temps restarts it (restart policy is always). If this happens repeatedly, increase the memory limit or investigate memory leaks.

Vertical scaling

Upgrade your VPS to get more CPU and RAM. This gives more resources to all containers.

Steps

Check current usage — Look at CPU and memory utilization in the Temps dashboard or via htop on the server
Upgrade the VPS — Use your provider's upgrade option (DigitalOcean, Hetzner, Linode, AWS, etc.)
Temps continues running — Most VPS upgrades preserve the disk. Temps and your containers restart automatically after the server reboots

Recommended server sizes

Traffic level	Server spec	Monthly cost (approx)
Hobby / side project	2 CPU, 4 GB RAM	$5-10
Small production app	4 CPU, 8 GB RAM	$20-40
Medium traffic	8 CPU, 16 GB RAM	$40-80
High traffic	16+ CPU, 32+ GB RAM	$80-160+

These are rough guidelines. Actual needs depend on your application — a database-heavy app needs more RAM; a compute-heavy app needs more CPU.

Optimize before scaling

Optimization is free and often more effective than adding resources:

Database queries

The most common performance bottleneck. Check for:

N+1 queries — Loading related data in a loop instead of a JOIN
Missing indexes — Add indexes on columns used in WHERE, JOIN, and ORDER BY clauses
Expensive queries — Use EXPLAIN ANALYZE to find slow queries
Connection pooling — Set appropriate pool sizes (20-50 connections per container)

Caching

Add caching for data that does not change on every request:

import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });

async function getUsers() {
  const cached = await redis.get('users');
  if (cached) return JSON.parse(cached);

  const users = await db.query('SELECT * FROM users');
  await redis.set('users', JSON.stringify(users), { EX: 300 }); // 5 min cache
  return users;
}

Frontend performance

For static sites and SPAs deployed on Temps:

Temps already applies gzip compression, cache headers, and ETags automatically
Use code splitting to reduce initial bundle size
Lazy load routes and heavy components
Optimize images (use modern formats like WebP/AVIF)

Application profiling

Profile your application to find hotspots before guessing:

Node.js: node --inspect or clinic.js
Python: cProfile or py-spy
Go: pprof
Rust: flamegraph

Monitoring

Track these metrics to know when scaling is needed:

Response time — Increasing trend means the server is under pressure
CPU usage — Sustained >80% means CPU-bound; add replicas or upgrade
Memory usage — Approaching the limit means containers may get killed
Error rate — Spikes correlate with resource exhaustion

Temps includes built-in analytics and monitoring. For server-level metrics, use standard tools:

# Real-time resource usage
htop

# Container-level stats
docker stats

# Disk usage
df -h

For application-level monitoring, add OpenTelemetry tracing — Temps injects OTEL_* environment variables automatically. See the observability tutorial for setup instructions.

Database bottleneck — All replicas share the same database. More replicas means more database connections but the same query performance.
External API rate limits — More replicas make more requests to external APIs. You may hit rate limits faster.
Single-threaded applications — Node.js runs on a single thread. Each replica uses one CPU core. If your server has 4 cores, 4 replicas fully utilize the CPU. Beyond that, you need a bigger server.

Deployment is slow

Build time is separate from runtime performance. If builds are slow:

Use multi-stage Docker builds to cache dependency installation
Use .dockerignore to reduce build context size
Upgrade the server for faster builds (builds use half of available CPU and memory)

Scale for Traffic — hands-on dashboard & API guide Multi-node deployment Add monitoring and tracing Set up CI/CD Analytics Optimize Docker builds