March 11, 2026 (1mo ago)
Written by Temps Team
Last updated March 11, 2026 (1mo ago)
Your team uses Claude for code review, GPT-4.1 for customer support, Gemini for document processing, and Grok for internal chat. That's four API keys, four billing dashboards, four sets of rate limits, and zero unified view of what you're spending.
According to Kong, 72% of enterprises plan to increase GenAI spending in 2025. The average monthly AI spend already hits $85,521 according to CloudZero. And most teams can't even tell you which model is burning through their budget.
Today we're releasing the Temps AI Gateway — a built-in, OpenAI-compatible proxy that routes requests to Anthropic, OpenAI, Google Gemini, and xAI through a single endpoint. With usage tracking, cost analytics, and per-request attribution baked in.
No extra service to deploy. No new bill. It's part of the same binary that handles your deployments.
TL;DR: Temps now includes a built-in AI Gateway that proxies OpenAI, Anthropic, Gemini, and Grok through one OpenAI-compatible endpoint. Track costs per model, per user, per conversation — with zero additional infrastructure. According to Kong, 72% of enterprises are increasing AI spend in 2025, and most have no idea where the money goes.
According to the Stanford HAI AI Index, the cost of querying a GPT-3.5-level model dropped from $20 to $0.07 per million tokens in 18 months — a 280-fold reduction. Sounds cheap, right? But token costs aren't the problem anymore. Operational sprawl is.
Here's what happens without a gateway:
A gateway gives you one endpoint, one set of credentials, and one dashboard. Point your code at https://your-temps.example.com/api/ai/v1/chat/completions and it routes to the right provider based on the model name.
The gateway exposes three OpenAI-compatible endpoints:
POST /api/ai/v1/chat/completions → Chat (all providers)
POST /api/ai/v1/embeddings → Embeddings (OpenAI)
GET /api/ai/v1/models → List available models
Every request flows through this pipeline:
claude-sonnet-4-6 goes to Anthropic, gpt-4.1-nano goes to OpenAI, gemini-2.5-flash goes to GoogleThe total gateway overhead? Under 10ms. The bottleneck is always the upstream provider, never the proxy.
| Provider | Models | Streaming |
|---|---|---|
| Anthropic | Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 | Yes |
| OpenAI | GPT-5.4, GPT-4.1, o3, o4-mini, GPT-4o, embeddings | Yes |
| Gemini 3.1 Pro, 2.5 Flash/Pro, 2.0 Flash | Yes | |
| xAI | Grok 4.1 Fast, Grok 3, Grok 3 Mini | Yes |
All models, including streaming responses, work through the same /chat/completions endpoint. The gateway handles SSE chunk translation transparently — Anthropic's message_delta events get normalized to OpenAI's delta format.
If your code already calls OpenAI, switching to the Temps gateway takes one line:
import openai
client = openai.OpenAI(
api_key="tk_your_temps_api_key",
base_url="https://your-temps.example.com/api/ai/v1",
)
# This routes to Anthropic automatically
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Explain TCP vs UDP"}],
)
Same code, same SDK, same types. But now your request goes through Temps, gets logged, and you can see it in your dashboard.
This works with any OpenAI-compatible SDK — Python, TypeScript, Go, Rust, curl. If it speaks OpenAI's API format, it works.
Don't want to store provider keys in Temps? Pass them per-request:
curl https://your-temps.example.com/api/ai/v1/chat/completions \
-H "Authorization: Bearer tk_your_temps_key" \
-H "x-provider-api-key: sk-ant-your-anthropic-key" \
-d '{"model": "claude-sonnet-4-6", "messages": [...]}'
BYOK keys are ephemeral — they're used for the request and never stored. System-configured keys are encrypted at rest with AES-256-GCM.
Every request through the gateway gets logged to a TimescaleDB hypertable with 15 columns of metadata:
team:platform, feature:chatbot, env:stagingThe dashboard breaks this down into:
Summary cards — total requests, tokens, cost, average latency, error rate
Timeseries charts — requests and tokens over time with hourly/daily bucketing
Per-model breakdown — which models cost the most, which are fastest
Per-provider view — compare Anthropic vs OpenAI vs Gemini at a glance
Conversation analytics — group requests by x-conversation-id header to see full conversation costs
Pass tags via the x-tags header to attribute costs to teams, features, or environments:
curl ... \
-H "x-tags: team:ml-ops, feature:code-review, env:production" \
-d '{"model": "claude-sonnet-4-6", ...}'
Then filter your analytics by any tag combination. Finally answer: "How much does the code review feature cost us per month?"
Standalone AI gateways like Portkey, Helicone, and LiteLLM are good products. But they're another service to deploy, another bill to pay, and another thing to monitor.
According to TrueFoundry, Kong's AI gateway costs over $30 per million requests. LiteLLM is open source but has no commercial backing, frequent regressions, and significant latency overhead. Portkey and Helicone are SaaS — your requests route through their servers.
The Temps AI Gateway is different:
| Temps AI Gateway | Standalone gateways | |
|---|---|---|
| Deployment | Already running — it's part of your Temps binary | Separate service to deploy and maintain |
| Cost | Free (included in Temps) | $30+/million requests or monthly SaaS fee |
| Data residency | Your server, your data | Their servers (SaaS) or yours (self-hosted) |
| Integration | Same auth, same dashboard as deployments | Separate auth system, separate dashboard |
| Monitoring | Built-in OTel traces + usage analytics | Usually one or the other |
| Key management | AES-256-GCM encrypted, same vault as env vars | Separate secret management |
The real value isn't the proxy itself — it's that the gateway lives alongside your deployments, analytics, error tracking, and monitoring. One platform, one login, one bill.
Temps also includes a full OpenTelemetry collector. Combine the AI Gateway with OTel GenAI semantic conventions and you get end-to-end traces of every AI interaction:
The AI Activity tab in your project dashboard shows these traces in a conversation view — system prompt, user messages, assistant responses, tool calls, and even reasoning/thinking blocks rendered inline.
# Your app code — OTel auto-instruments the gateway calls
from opentelemetry import trace
tracer = trace.get_tracer("my-ai-app")
with tracer.start_as_current_span("chat claude-sonnet-4-6"):
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "..."}],
)
This trace shows up in the AI Activity dashboard with the full conversation, token counts, and latency — all without any additional infrastructure.
In the Temps dashboard, go to AI Gateway and add your provider API keys. Paste your Anthropic, OpenAI, Gemini, or xAI key — each one is encrypted with AES-256-GCM before hitting the database.
No CLI command needed. The dashboard validates the key format and tests connectivity before saving.
Use your existing Temps API key — the same one you use for deployments and the CLI:
temps api-key --name "ai-gateway" --role admin
# → tk_abc123...
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "tk_abc123",
baseURL: "https://your-temps.example.com/api/ai/v1",
});
// Works with any supported model
const anthropic = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: "Hello" }],
});
const openai = await client.chat.completions.create({
model: "gpt-4.1-nano",
messages: [{ role: "user", content: "Hello" }],
});
const gemini = await client.chat.completions.create({
model: "gemini-2.5-flash",
messages: [{ role: "user", content: "Hello" }],
});
That's it. Four providers, one endpoint, one API key.
AI-powered support bots — Route customer queries to the cheapest model that meets quality thresholds. Track cost per conversation.
Code review pipelines — Use Claude for code analysis, GPT for documentation generation, Gemini for test generation. See which step costs the most.
RAG applications — Embeddings through OpenAI, retrieval-augmented generation through Anthropic. One endpoint, unified cost tracking.
Multi-model comparison — A/B test models by sending identical requests to different providers and comparing quality vs cost vs latency.
Internal AI tools — Give every team access to AI without sharing raw API keys. Track usage per team via tags. Set up RBAC with Read/Execute/Admin permissions.
The gateway overhead is under 10ms — including authentication, request translation, and response normalization. The bottleneck is always the upstream provider (typically 200-2000ms), not the proxy.
Yes. Use the x-provider-api-key header to pass your key per-request. It's used for that request only and never stored. This is called BYOK (Bring Your Own Key).
Yes. All four providers support streaming through the same /chat/completions endpoint with "stream": true. The gateway translates SSE chunks between provider formats transparently.
The AI Gateway is included in Temps at no extra charge. You pay your provider API costs directly (OpenAI, Anthropic, etc.) and Temps adds zero markup. If you're on Temps Cloud, it's included in your $6/mo plan.
Yes. If you instrument your app with OpenTelemetry GenAI semantic conventions, the AI Activity tab shows full conversation threads — system prompts, user messages, assistant responses, tool calls, and thinking blocks.
The AI Gateway ships with the latest Temps release. If you're already running Temps, update to the latest version and configure your provider keys.
If you're new to Temps, get started in under 5 minutes:
curl -fsSL https://get.temps.sh | sh
One binary. Deployments, analytics, error tracking, monitoring, and now an AI gateway. No SaaS sprawl required.
Links: