March 11, 2026 (3mo ago)
Written by Temps Team
Last updated March 11, 2026 (3mo ago)
A self-hosted AI gateway is a proxy that sits between your application and multiple AI provider APIs (OpenAI, Anthropic, Gemini, xAI), letting you route all LLM requests through a single endpoint with unified authentication, cost tracking, and observability — running on your own infrastructure instead of a third-party SaaS.
Temps ships an AI Gateway as a built-in feature — no separate service to deploy, no additional bill. It runs inside the same Rust binary that handles your deployments, analytics, and error tracking.
TL;DR: The Temps AI Gateway is an OpenAI-compatible proxy built into the Temps self-hosted PaaS. It routes requests to Anthropic (Claude), OpenAI (GPT/o-series), Google Gemini, and xAI (Grok) through one endpoint, with per-request cost tracking stored at 1/10,000th-cent precision. Provider API keys are encrypted with AES-256-GCM. Temps is Apache 2.0 and free to self-host; Temps Cloud costs ~$6/mo (Hetzner cost + 30% margin, no per-seat fees).
According to Kong, 72% of enterprises plan to increase GenAI spending in 2025. The average monthly AI spend already hits $85,521 according to CloudZero. And most teams can't answer "which model is burning through our budget?"
Here's what happens without a gateway:
| Temps AI Gateway | Portkey / Helicone | LiteLLM | |
|---|---|---|---|
| Deployment | Built into your existing Temps binary — zero extra setup | Separate SaaS (Portkey/Helicone) or self-hosted service (LiteLLM) | Self-hosted service, separate process to manage |
| Cost | Free (included in Temps self-host); ~$6/mo on Temps Cloud | See pricing page | Free open source; commercial tier: see pricing page |
| Data residency | Your server, always | SaaS: their servers; self-hosted: yours | Your server |
| Integration | Same auth, dashboard, and observability as your deployments | Separate auth system and dashboard | Separate dashboard |
| OTel traces | Built-in — conversation threads, tool calls, thinking blocks | Add-on or SaaS-only | Partial |
| Provider key encryption | AES-256-GCM, same vault as deployment env vars | Varies | Varies |
| License | Apache 2.0 | Proprietary SaaS / BSL | MIT |
The core difference: Temps embeds the gateway in the same process as deployments, analytics, error tracking, and uptime monitoring. You don't run a separate service, don't pay a separate bill, and don't log in to a separate dashboard.
The gateway exposes three OpenAI-compatible endpoints:
POST /api/ai/v1/chat/completions → Chat (Anthropic, OpenAI, Gemini, xAI)
POST /api/ai/v1/embeddings → Embeddings (OpenAI)
GET /api/ai/v1/models → List available models
Every request flows through this pipeline:
claude-* → Anthropic, gpt-*/o1/o3/o4 → OpenAI, gemini-* → Google, grok-* → xAI)The gateway overhead is under 10ms. The bottleneck is always the upstream provider, never the proxy.
| Provider | Models | Streaming |
|---|---|---|
| Anthropic | Any claude-* model | Yes |
| OpenAI | Any gpt-*, o1, o3, o4, text-embedding-*, dall-e-* | Yes |
Any gemini-* model | Yes | |
| xAI | Any grok-* model | Yes |
| Custom | Any OpenAI-compatible endpoint | Yes |
Routing is by model name prefix — case-insensitive. Unknown prefixes return a ModelNotFound error.
If your code already calls OpenAI, switching to the Temps gateway takes one line:
import openai
client = openai.OpenAI(
api_key="tk_your_temps_api_key",
base_url="https://your-temps.example.com/api/ai/v1",
)
# Routes to Anthropic automatically based on model prefix
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Explain TCP vs UDP"}],
)
Same code, same SDK, same types. Works with any OpenAI-compatible SDK — Python, TypeScript, Go, Rust, curl.
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "tk_abc123",
baseURL: "https://your-temps.example.com/api/ai/v1",
});
// Four providers, one endpoint, one API key
const anthropic = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: "Hello" }],
});
const openaiModel = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
const gemini = await client.chat.completions.create({
model: "gemini-2.5-flash",
messages: [{ role: "user", content: "Hello" }],
});
Don't want to store provider keys in Temps? Pass them per-request via X-Provider-Api-Key header:
curl https://your-temps.example.com/api/ai/v1/chat/completions \
-H "Authorization: Bearer tk_your_temps_key" \
-H "X-Provider-Api-Key: sk-ant-your-anthropic-key" \
-d '{"model": "claude-sonnet-4-6", "messages": [...]}'
BYOK keys are ephemeral — used for the request and never stored. System-configured keys are encrypted at rest with AES-256-GCM (via the aes-gcm Rust crate, same encryption layer used for deployment environment variables).
Every request through the gateway gets logged with full metadata:
total_cost_microcents)team:platform, feature:chatbot, env:stagingThe dashboard provides:
Pass tags via the x-tags header to attribute costs to teams, features, or environments:
curl ... \
-H "x-tags: team:ml-ops, feature:code-review, env:production" \
-d '{"model": "claude-sonnet-4-6", ...}'
Then filter your analytics by any tag combination. Finally answer: "How much does the code review feature cost us per month?"
Temps includes a built-in OpenTelemetry collector. Combine the AI Gateway with OTel GenAI semantic conventions for end-to-end traces of every AI interaction:
The AI Activity tab in your project dashboard shows these traces as a conversation view — system prompt, user messages, assistant responses, tool calls, and reasoning/thinking blocks rendered inline.
from opentelemetry import trace
tracer = trace.get_tracer("my-ai-app")
with tracer.start_as_current_span("chat claude-sonnet-4-6"):
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "..."}],
)
In the Temps dashboard, go to AI Gateway and add your provider API keys. Each key is encrypted with AES-256-GCM before being written to the database. The dashboard validates the key format and tests connectivity before saving.
No CLI command required for this step.
bunx @temps-sdk/cli api-key create --name "ai-gateway" --role admin
# → tk_abc123...
This is the same API key you use for deployments and the CLI — no new credentials to manage.
import openai
client = openai.OpenAI(
api_key="tk_abc123",
base_url="https://your-temps.example.com/api/ai/v1",
)
That's it. Four providers, one endpoint, one API key.
AI-powered support bots — Route customer queries to the cheapest model that meets quality thresholds. Track cost per conversation.
Code review pipelines — Use Claude for code analysis, GPT for documentation generation, Gemini for test generation. See which step costs the most.
RAG applications — Embeddings through OpenAI, retrieval-augmented generation through Anthropic. One endpoint, unified cost tracking.
Multi-model comparison — A/B test models by sending identical requests to different providers and comparing quality vs cost vs latency.
Internal AI tools — Give every team access to AI without sharing raw API keys. Track usage per team via tags.
The Temps AI Gateway adds under 10ms overhead — including authentication, request translation, and response normalization. The bottleneck is always the upstream provider (typically 200–2000ms), never the proxy.
Yes. Use the X-Provider-Api-Key header to pass your key per-request (BYOK mode). It's used for that request only and never stored. System-configured keys are encrypted with AES-256-GCM.
Yes. All four providers support streaming via "stream": true. The gateway translates SSE chunks between provider formats — Anthropic's message_delta events are normalized to OpenAI's delta format.
The AI Gateway is included in Temps at no extra charge. You pay your provider API costs directly (OpenAI, Anthropic, etc.) — Temps adds zero markup. On Temps Cloud, it's included in the ~$6/mo plan. Self-hosting is free under the Apache 2.0 license.
Yes. Instrument your app with OpenTelemetry GenAI semantic conventions and the AI Activity tab shows full conversation threads — system prompts, user messages, assistant responses, tool calls, and thinking blocks.
Temps routes by model name prefix: claude-* → Anthropic, gpt-*/o1/o3/o4/text-embedding-* → OpenAI, gemini-* → Google, grok-* → xAI. Custom OpenAI-compatible endpoints are also supported.
The AI Gateway ships with the latest Temps release. If you're already running Temps, update to the latest version and configure your provider keys in the dashboard.
If you're new to Temps, get started in under 5 minutes:
curl -fsSL https://get.temps.sh | sh
One binary. Deployments, analytics, error tracking, monitoring, and a built-in AI gateway. No SaaS sprawl required. Apache 2.0, free to self-host.
Links: