HA Databases & Internal DNS (*.temps.local)

Temps gives every highly-available (HA) database cluster a stable set of DNS names in an internal *.temps.local zone. Your application connects to a name like mydb.temps.local (or primary.mydb.temps.local), and the platform keeps that name pointed at the right container as members come and go. After a Postgres failover, connections to the cluster name follow the new primary on the next reconciler tick — no app restart, no connection-string change.

Use this when you run a clustered managed service (for example a Postgres HA cluster) and want your apps to reach it by a single stable hostname instead of hard-coding container IPs that change on every restart.


How it works

The internal DNS plane has two halves: a control-plane registry that decides what each name should resolve to, and a per-node resolver that serves those answers to containers.

The per-node resolver

A new crate, temps-dns-resolver, runs a Hickory-based DNS server embedded in the Temps agent (temps-agent) on every node. It:

  • Serves authoritative A/AAAA/CNAME answers for the *.temps.local zone over both UDP and TCP.
  • Forwards out-of-zone queries (package mirrors, third-party APIs) to upstream public resolvers, so containers that point at it as their only nameserver can still reach the public internet.
  • Listens by default on the bridge gateway IP (port 53) so every container on the node sees it as a nameserver, plus on 127.0.0.53:53 for host-local debugging (dig @127.0.0.53 mydb.temps.local).
  • Persists every applied zone generation to a local zone.json snapshot, so it can serve stale-but-correct records across resolver restarts and control-plane outages.

The resolver never reads the database directly. It stays in sync by long-polling the control plane:

GET  {control_plane_url}/api/internal/nodes/{node_id}/dns/changes?since={generation}
POST {control_plane_url}/api/internal/nodes/{node_id}/dns/ack

It sends since=N (its last-applied generation), receives a diff or a full snapshot, applies it, then ACKs the generation it applied.

The control-plane registry

On the control plane, the DnsRegistry service (in the temps-dns crate) is the sole writer to the service_endpoints table. Every mutation bumps a cluster-wide monotonic generation counter (computed at write time as MAX(generation) + 1 inside a transaction), which is what makes the since=N long-poll diff well-defined.

Because Docker assigns a fresh IP to a container on every create, the registry rewrites a cluster member's records on every container start, not just first creation — using an atomic delete-and-insert so consumers never see a window with both the old and new IP present.

Record tiers

Records come in tiers:

TierRecordFQDNTTLWhat it points at
Tier 2Per-member A<name>-<ordinal>.<name>.temps.local30sOne specific cluster member.
Tier 3Primary aliasprimary.<svc>.temps.local5sThe current Postgres primary.
Tier 3Replica aliasreplica.<svc>.temps.local30sMulti-A: every healthy secondary.
Tier 3Cluster VIP<svc>.temps.local30sMulti-A round-robin across all healthy data members.

The cluster VIP (<svc>.temps.local) returns one A record per healthy data member — the resolver returns all matching records, so a single lookup gets the whole set. Combined with libpq's target_session_attrs=read-write, writes land on the primary while plain read connections fan out across the set.

When a member has no overlay IP — for example the pg_auto_failover monitor running on the control plane, or a single-host setup — the registry falls back to the node's underlay address plus the published host port, so the FQDN still resolves through Docker's port forward.

Failover behavior

A per-cluster role reconciler runs on the control plane on a fixed tick (TICK_INTERVAL = 5s). On each tick it queries the pg_auto_failover monitor and atomically rewrites the Tier 3 role/VIP records. Reconcilers are re-spawned at plugin startup for every running cluster, so they survive control-plane restarts.

After a failover/promotion, the reconciler refreshes the role-aliased records on its next tick. Application connections to <svc>.temps.local (or primary.<svc>.temps.local) then follow the new primary without a restart. End-to-end, the promotion-to-DNS-flip is bounded at roughly 6 seconds: the 5s reconciler tick plus the ~1s agent sync long-poll.

Container wiring

Cluster member containers are wired to the resolver automatically. When the agent creates a cluster member container, it sets Docker's HostConfig.dns to the gateway IP of the temps0 overlay network, so *.temps.local resolves natively from inside the container.

This fails open: on single-host setups where the temps0 overlay isn't bootstrapped, DNS injection is skipped and Docker's default resolver is used.


Configuration

The resolver is configured by the agent at startup — there are no user-facing flags for it. These are the defaults baked into ResolverConfig:

SettingDefaultNotes
Listen addresses<bridge-gateway>:53 and 127.0.0.53:53Both UDP and TCP. The gateway address is what containers see.
Poll interval1sEffective propagation cadence for failover.
Initial backoff1sAfter a sync error; doubles up to the max.
Max backoff30sCap between failed syncs.
HTTP timeout10sPer sync call.
Snapshot file<snapshot_dir>/zone.jsonDefault snapshot_dir in the agent is /var/lib/temps/dns.
Upstream resolvers1.1.1.1, 1.0.0.1, 8.8.8.8 (all port 53)Cloudflare + Google. An empty list disables forwarding (strict authoritative-only).
Reconciler settingValueSource
Tick interval5sTICK_INTERVAL
primary.<svc> TTL5sShort so apps recover fast after promotion.
replica.<svc> TTL30sMembers change less often.
<svc> VIP TTL30sMembers change less often.

Worked example

Once you have a Postgres HA cluster running, an app deployed on the same Temps instance can connect through the cluster names. The exact members and zone are created for you; you only need the names.

Connect to the cluster VIP

Point your connection string at <svc>.temps.local and let libpq route writes to the primary:

postgresql://appuser:password@mydb.temps.local:5432/appdb?target_session_attrs=read-write
  • mydb.temps.local resolves to all healthy data members (multi-A round-robin).
  • target_session_attrs=read-write makes the client connect to whichever member accepts writes (the primary).

Pin reads to replicas, writes to the primary

If you want to send read traffic explicitly to replicas:

# Writes — always the current primary
postgresql://appuser:password@primary.mydb.temps.local:5432/appdb

# Reads — fan out across healthy secondaries
postgresql://appuser:password@replica.mydb.temps.local:5432/appdb

Verify resolution from inside a container

From any container on the node (or from the host using the debug listener):

# From inside a cluster member container
nslookup mydb.temps.local
nslookup primary.mydb.temps.local

# From the host, against the debug listener
dig @127.0.0.53 mydb.temps.local

After a failover, re-run the primary.mydb.temps.local lookup — within roughly 6 seconds it returns the new primary's address, and existing connection strings keep working unchanged.

What happens after a failover

The full sequence, from the moment the primary goes down to clients reconnecting, is:

  1. Primary is detected down. pg_auto_failover's monitor notices the current primary has stopped reporting healthy.
  2. A replica is promoted. pg_auto_failover promotes a healthy secondary to primary; the monitor's pgautofailover.node view now reports the new primary.
  3. The A record is rewritten. On its next 5s reconciler tick, the role reconciler queries the monitor, sees the new primary, and atomically rewrites the primary.<svc>.temps.local (and VIP) records to the new address.
  4. Existing connections drop. primary.<svc>.temps.local is a plain A record, not a connection proxy, so live TCP sessions to the old primary are not migrated — they fail when the old primary goes away.
  5. Clients re-resolve and reconnect. A client whose connection dropped re-resolves the name (the short 5s TTL means it gets the new address quickly) and reconnects to the new primary automatically — no connection-string change.

Concretely, watching the record flip:

# Before failover
nslookup primary.mydb.temps.local
# → 172.20.5.10  (old primary)

# Primary fails; pg_auto_failover promotes a replica.
# On the reconciler's next 5s tick, it rewrites the primary A record
# (TTL 5s), and the agent applies it on its next ~1s sync poll.

# After failover — same command, new address
nslookup primary.mydb.temps.local
# → 172.20.5.11  (new primary)

End-to-end this is bounded at roughly 6 seconds (the 5s reconciler tick plus the ~1s agent sync). Because primary.<svc>.temps.local is a plain A record — not a connection proxy — live TCP sessions to the old primary do not migrate: a client whose connection drops simply re-resolves the name and reconnects to the new primary, with no connection-string change. The short 5s TTL keeps stale answers from outliving the flip.


Notes & gotchas

  • The /api prefix is load-bearing. The DNS sync handlers are declared without it, but the plugin runtime mounts plugin routes under /api. The served paths are /api/internal/nodes/{node_id}/dns/changes and /api/internal/nodes/{node_id}/dns/ack.
  • Stale-but-correct over down. Every resolver failure mode is "keep serving the last snapshot." If the control plane is unreachable, the resolver serves the records from its zone.json snapshot rather than failing lookups.
  • DNS injection fails open. On single-host setups without the temps0 overlay, the agent does not set HostConfig.dns, and containers fall back to Docker's default resolver. Cluster FQDNs only resolve natively when the overlay is up.
  • Underlay fallback. Members without an overlay IP (such as the monitor on the control plane) resolve to the node's underlay address plus the published host port, so the FQDN still works through Docker's port forward.
  • <svc>.temps.local is not a single VIP. It is a multi-A answer across all healthy data members. Routing writes to the primary relies on target_session_attrs=read-write, not on the DNS layer choosing one address.

Next steps

Was this page helpful?