Built-In AI Gateway: OpenAI, Anthropic, Gemini, and Grok Through One Endpoint
Built-In AI Gateway: OpenAI, Anthropic, Gemini, and Grok Through One Endpoint
March 11, 2026 (3 days ago)
Written by Temps Team
Last updated March 11, 2026 (3 days ago)
Your team uses Claude for code review, GPT-4.1 for customer support, Gemini for document processing, and Grok for internal chat. That's four API keys, four billing dashboards, four sets of rate limits, and zero unified view of what you're spending.
72% of enterprises plan to increase GenAI spending in 2025 (Kong, 2025). The average monthly AI spend already hits $85,521 (CloudZero, 2025). And most teams can't even tell you which model is burning through their budget.
Today we're releasing the Temps AI Gateway — a built-in, OpenAI-compatible proxy that routes requests to Anthropic, OpenAI, Google Gemini, and xAI through a single endpoint. With usage tracking, cost analytics, and per-request attribution baked in.
No extra service to deploy. No new bill. It's part of the same binary that handles your deployments.
TL;DR: Temps now includes a built-in AI Gateway that proxies OpenAI, Anthropic, Gemini, and Grok through one OpenAI-compatible endpoint. Track costs per model, per user, per conversation — with zero additional infrastructure. 72% of enterprises are increasing AI spend in 2025 (Kong, 2025), and most have no idea where the money goes.
Why Do You Need an AI Gateway?
The cost of querying a GPT-3.5-level model dropped from $20 to $0.07 per million tokens in 18 months — a 280-fold reduction (Stanford HAI AI Index, 2025). Sounds cheap, right? But token costs aren't the problem anymore. Operational sprawl is.
Here's what happens without a gateway:
- 4+ API keys scattered across env vars, Vault secrets, and team Slack DMs
- 4+ billing portals with different invoicing cycles and currency formats
- No cross-provider cost view — you can't answer "how much did our AI features cost last month?"
- No per-user or per-feature attribution — was it the chatbot or the code reviewer that burned $3,000?
- Key rotation nightmares — changing one key means updating 12 services
A gateway gives you one endpoint, one set of credentials, and one dashboard. Point your code at https://your-temps.example.com/api/ai/v1/chat/completions and it routes to the right provider based on the model name.
How the Temps AI Gateway Works
The gateway exposes three OpenAI-compatible endpoints:
POST /api/ai/v1/chat/completions → Chat (all providers)
POST /api/ai/v1/embeddings → Embeddings (OpenAI)
GET /api/ai/v1/models → List available models
Every request flows through this pipeline:
- Authentication — Bearer token from your Temps API key
- Model routing —
claude-sonnet-4-6goes to Anthropic,gpt-4.1-nanogoes to OpenAI,gemini-2.5-flashgoes to Google - Request translation — Your OpenAI-format request gets translated to each provider's native API format
- Upstream call — The gateway makes the actual API call with the encrypted provider key
- Response translation — Provider response gets normalized back to OpenAI format
- Usage logging — Tokens, latency, cost, and model are recorded to a TimescaleDB hypertable
The total gateway overhead? Under 10ms. The bottleneck is always the upstream provider, never the proxy.
Supported Providers and Models
| Provider | Models | Streaming |
|---|---|---|
| Anthropic | Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 | Yes |
| OpenAI | GPT-5.4, GPT-4.1, o3, o4-mini, GPT-4o, embeddings | Yes |
| Gemini 3.1 Pro, 2.5 Flash/Pro, 2.0 Flash | Yes | |
| xAI | Grok 4.1 Fast, Grok 3, Grok 3 Mini | Yes |
All models, including streaming responses, work through the same /chat/completions endpoint. The gateway handles SSE chunk translation transparently — Anthropic's message_delta events get normalized to OpenAI's delta format.
One-Line Integration: Just Change the Base URL
If your code already calls OpenAI, switching to the Temps gateway takes one line:
import openai
client = openai.OpenAI(
api_key="tk_your_temps_api_key",
base_url="https://your-temps.example.com/api/ai/v1",
)
# This routes to Anthropic automatically
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Explain TCP vs UDP"}],
)
Same code, same SDK, same types. But now your request goes through Temps, gets logged, and you can see it in your dashboard.
This works with any OpenAI-compatible SDK — Python, TypeScript, Go, Rust, curl. If it speaks OpenAI's API format, it works.
Bring Your Own Key (BYOK)
Don't want to store provider keys in Temps? Pass them per-request:
curl https://your-temps.example.com/api/ai/v1/chat/completions \
-H "Authorization: Bearer tk_your_temps_key" \
-H "x-provider-api-key: sk-ant-your-anthropic-key" \
-d '{"model": "claude-sonnet-4-6", "messages": [...]}'
BYOK keys are ephemeral — they're used for the request and never stored. System-configured keys are encrypted at rest with AES-256-GCM.
Usage Analytics: Know Where Every Token Goes
Every request through the gateway gets logged to a TimescaleDB hypertable with 15 columns of metadata:
- Provider and model — which model handled this request
- Token counts — input and output tokens
- Latency — end-to-end response time in milliseconds
- Cost — estimated cost calculated at 1/10,000th cent precision
- Conversation ID — group multi-turn conversations together
- Tags — arbitrary labels like
team:platform,feature:chatbot,env:staging - BYOK flag — was this a system key or a user-provided key?
The dashboard breaks this down into:
Summary cards — total requests, tokens, cost, average latency, error rate
Timeseries charts — requests and tokens over time with hourly/daily bucketing
Per-model breakdown — which models cost the most, which are fastest
Per-provider view — compare Anthropic vs OpenAI vs Gemini at a glance
Conversation analytics — group requests by x-conversation-id header to see full conversation costs
Tag-Based Cost Allocation
Pass tags via the x-tags header to attribute costs to teams, features, or environments:
curl ... \
-H "x-tags: team:ml-ops, feature:code-review, env:production" \
-d '{"model": "claude-sonnet-4-6", ...}'
Then filter your analytics by any tag combination. Finally answer: "How much does the code review feature cost us per month?"
How Is This Different from Portkey, Helicone, or LiteLLM?
Standalone AI gateways like Portkey, Helicone, and LiteLLM are good products. But they're another service to deploy, another bill to pay, and another thing to monitor.
Kong's AI gateway costs over $30 per million requests (TrueFoundry, 2025). LiteLLM is open source but has no commercial backing, frequent regressions, and significant latency overhead. Portkey and Helicone are SaaS — your requests route through their servers.
The Temps AI Gateway is different:
| Temps AI Gateway | Standalone gateways | |
|---|---|---|
| Deployment | Already running — it's part of your Temps binary | Separate service to deploy and maintain |
| Cost | Free (included in Temps) | $30+/million requests or monthly SaaS fee |
| Data residency | Your server, your data | Their servers (SaaS) or yours (self-hosted) |
| Integration | Same auth, same dashboard as deployments | Separate auth system, separate dashboard |
| Monitoring | Built-in OTel traces + usage analytics | Usually one or the other |
| Key management | AES-256-GCM encrypted, same vault as env vars | Separate secret management |
The real value isn't the proxy itself — it's that the gateway lives alongside your deployments, analytics, error tracking, and monitoring. One platform, one login, one bill.
OpenTelemetry Integration: Trace Every AI Call
Temps also includes a full OpenTelemetry collector. Combine the AI Gateway with OTel GenAI semantic conventions and you get end-to-end traces of every AI interaction:
- Which user triggered the request
- What the input and output messages were
- How long the model took to respond
- Token usage and estimated cost
- Thinking blocks and tool calls (for Anthropic)
The AI Activity tab in your project dashboard shows these traces in a conversation view — system prompt, user messages, assistant responses, tool calls, and even reasoning/thinking blocks rendered inline.
# Your app code — OTel auto-instruments the gateway calls
from opentelemetry import trace
tracer = trace.get_tracer("my-ai-app")
with tracer.start_as_current_span("chat claude-sonnet-4-6"):
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "..."}],
)
This trace shows up in the AI Activity dashboard with the full conversation, token counts, and latency — all without any additional infrastructure.
Setting It Up: 3 Steps
1. Configure Provider Keys
In the Temps dashboard, go to AI Gateway and add your provider API keys. Paste your Anthropic, OpenAI, Gemini, or xAI key — each one is encrypted with AES-256-GCM before hitting the database.
No CLI command needed. The dashboard validates the key format and tests connectivity before saving.
2. Get Your Temps API Key
Use your existing Temps API key — the same one you use for deployments and the CLI:
temps api-key --name "ai-gateway" --role admin
# → tk_abc123...
3. Point Your Code at the Gateway
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "tk_abc123",
baseURL: "https://your-temps.example.com/api/ai/v1",
});
// Works with any supported model
const anthropic = await client.chat.completions.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: "Hello" }],
});
const openai = await client.chat.completions.create({
model: "gpt-4.1-nano",
messages: [{ role: "user", content: "Hello" }],
});
const gemini = await client.chat.completions.create({
model: "gemini-2.5-flash",
messages: [{ role: "user", content: "Hello" }],
});
That's it. Four providers, one endpoint, one API key.
What Teams Are Building With This
AI-powered support bots — Route customer queries to the cheapest model that meets quality thresholds. Track cost per conversation.
Code review pipelines — Use Claude for code analysis, GPT for documentation generation, Gemini for test generation. See which step costs the most.
RAG applications — Embeddings through OpenAI, retrieval-augmented generation through Anthropic. One endpoint, unified cost tracking.
Multi-model comparison — A/B test models by sending identical requests to different providers and comparing quality vs cost vs latency.
Internal AI tools — Give every team access to AI without sharing raw API keys. Track usage per team via tags. Set up RBAC with Read/Execute/Admin permissions.
Frequently Asked Questions
Does the gateway add latency to my requests?
The gateway overhead is under 10ms — including authentication, request translation, and response normalization. The bottleneck is always the upstream provider (typically 200-2000ms), not the proxy.
Can I use the gateway without storing my API keys in Temps?
Yes. Use the x-provider-api-key header to pass your key per-request. It's used for that request only and never stored. This is called BYOK (Bring Your Own Key).
Does streaming work through the gateway?
Yes. All four providers support streaming through the same /chat/completions endpoint with "stream": true. The gateway translates SSE chunks between provider formats transparently.
How is this billed?
The AI Gateway is included in Temps at no extra charge. You pay your provider API costs directly (OpenAI, Anthropic, etc.) and Temps adds zero markup. If you're on Temps Cloud, it's included in your $6/mo plan.
Can I see the actual conversation messages in traces?
Yes. If you instrument your app with OpenTelemetry GenAI semantic conventions, the AI Activity tab shows full conversation threads — system prompts, user messages, assistant responses, tool calls, and thinking blocks.
Start Using It Today
The AI Gateway ships with the latest Temps release. If you're already running Temps, update to the latest version and configure your provider keys.
If you're new to Temps, get started in under 5 minutes:
curl -fsSL https://get.temps.sh | sh
One binary. Deployments, analytics, error tracking, monitoring, and now an AI gateway. No SaaS sprawl required.
Links: