t
Temps

Built-In AI Gateway: OpenAI, Anthropic, Gemini, and Grok Through One Endpoint

Built-In AI Gateway: OpenAI, Anthropic, Gemini, and Grok Through One Endpoint

March 11, 2026 (3 days ago)

Temps Team

Written by Temps Team

Last updated March 11, 2026 (3 days ago)

Your team uses Claude for code review, GPT-4.1 for customer support, Gemini for document processing, and Grok for internal chat. That's four API keys, four billing dashboards, four sets of rate limits, and zero unified view of what you're spending.

72% of enterprises plan to increase GenAI spending in 2025 (Kong, 2025). The average monthly AI spend already hits $85,521 (CloudZero, 2025). And most teams can't even tell you which model is burning through their budget.

Today we're releasing the Temps AI Gateway — a built-in, OpenAI-compatible proxy that routes requests to Anthropic, OpenAI, Google Gemini, and xAI through a single endpoint. With usage tracking, cost analytics, and per-request attribution baked in.

No extra service to deploy. No new bill. It's part of the same binary that handles your deployments.

TL;DR: Temps now includes a built-in AI Gateway that proxies OpenAI, Anthropic, Gemini, and Grok through one OpenAI-compatible endpoint. Track costs per model, per user, per conversation — with zero additional infrastructure. 72% of enterprises are increasing AI spend in 2025 (Kong, 2025), and most have no idea where the money goes.


Why Do You Need an AI Gateway?

The cost of querying a GPT-3.5-level model dropped from $20 to $0.07 per million tokens in 18 months — a 280-fold reduction (Stanford HAI AI Index, 2025). Sounds cheap, right? But token costs aren't the problem anymore. Operational sprawl is.

Here's what happens without a gateway:

  • 4+ API keys scattered across env vars, Vault secrets, and team Slack DMs
  • 4+ billing portals with different invoicing cycles and currency formats
  • No cross-provider cost view — you can't answer "how much did our AI features cost last month?"
  • No per-user or per-feature attribution — was it the chatbot or the code reviewer that burned $3,000?
  • Key rotation nightmares — changing one key means updating 12 services

A gateway gives you one endpoint, one set of credentials, and one dashboard. Point your code at https://your-temps.example.com/api/ai/v1/chat/completions and it routes to the right provider based on the model name.


How the Temps AI Gateway Works

The gateway exposes three OpenAI-compatible endpoints:

POST /api/ai/v1/chat/completions   → Chat (all providers)
POST /api/ai/v1/embeddings         → Embeddings (OpenAI)
GET  /api/ai/v1/models             → List available models

Every request flows through this pipeline:

  1. Authentication — Bearer token from your Temps API key
  2. Model routingclaude-sonnet-4-6 goes to Anthropic, gpt-4.1-nano goes to OpenAI, gemini-2.5-flash goes to Google
  3. Request translation — Your OpenAI-format request gets translated to each provider's native API format
  4. Upstream call — The gateway makes the actual API call with the encrypted provider key
  5. Response translation — Provider response gets normalized back to OpenAI format
  6. Usage logging — Tokens, latency, cost, and model are recorded to a TimescaleDB hypertable

The total gateway overhead? Under 10ms. The bottleneck is always the upstream provider, never the proxy.

Supported Providers and Models

ProviderModelsStreaming
AnthropicClaude Opus 4.6, Sonnet 4.6, Haiku 4.5Yes
OpenAIGPT-5.4, GPT-4.1, o3, o4-mini, GPT-4o, embeddingsYes
GoogleGemini 3.1 Pro, 2.5 Flash/Pro, 2.0 FlashYes
xAIGrok 4.1 Fast, Grok 3, Grok 3 MiniYes

All models, including streaming responses, work through the same /chat/completions endpoint. The gateway handles SSE chunk translation transparently — Anthropic's message_delta events get normalized to OpenAI's delta format.


One-Line Integration: Just Change the Base URL

If your code already calls OpenAI, switching to the Temps gateway takes one line:

import openai

client = openai.OpenAI(
    api_key="tk_your_temps_api_key",
    base_url="https://your-temps.example.com/api/ai/v1",
)

# This routes to Anthropic automatically
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Explain TCP vs UDP"}],
)

Same code, same SDK, same types. But now your request goes through Temps, gets logged, and you can see it in your dashboard.

This works with any OpenAI-compatible SDK — Python, TypeScript, Go, Rust, curl. If it speaks OpenAI's API format, it works.

Bring Your Own Key (BYOK)

Don't want to store provider keys in Temps? Pass them per-request:

curl https://your-temps.example.com/api/ai/v1/chat/completions \
  -H "Authorization: Bearer tk_your_temps_key" \
  -H "x-provider-api-key: sk-ant-your-anthropic-key" \
  -d '{"model": "claude-sonnet-4-6", "messages": [...]}'

BYOK keys are ephemeral — they're used for the request and never stored. System-configured keys are encrypted at rest with AES-256-GCM.


Usage Analytics: Know Where Every Token Goes

Every request through the gateway gets logged to a TimescaleDB hypertable with 15 columns of metadata:

  • Provider and model — which model handled this request
  • Token counts — input and output tokens
  • Latency — end-to-end response time in milliseconds
  • Cost — estimated cost calculated at 1/10,000th cent precision
  • Conversation ID — group multi-turn conversations together
  • Tags — arbitrary labels like team:platform, feature:chatbot, env:staging
  • BYOK flag — was this a system key or a user-provided key?

The dashboard breaks this down into:

Summary cards — total requests, tokens, cost, average latency, error rate

Timeseries charts — requests and tokens over time with hourly/daily bucketing

Per-model breakdown — which models cost the most, which are fastest

Per-provider view — compare Anthropic vs OpenAI vs Gemini at a glance

Conversation analytics — group requests by x-conversation-id header to see full conversation costs

Tag-Based Cost Allocation

Pass tags via the x-tags header to attribute costs to teams, features, or environments:

curl ... \
  -H "x-tags: team:ml-ops, feature:code-review, env:production" \
  -d '{"model": "claude-sonnet-4-6", ...}'

Then filter your analytics by any tag combination. Finally answer: "How much does the code review feature cost us per month?"


How Is This Different from Portkey, Helicone, or LiteLLM?

Standalone AI gateways like Portkey, Helicone, and LiteLLM are good products. But they're another service to deploy, another bill to pay, and another thing to monitor.

Kong's AI gateway costs over $30 per million requests (TrueFoundry, 2025). LiteLLM is open source but has no commercial backing, frequent regressions, and significant latency overhead. Portkey and Helicone are SaaS — your requests route through their servers.

The Temps AI Gateway is different:

Temps AI GatewayStandalone gateways
DeploymentAlready running — it's part of your Temps binarySeparate service to deploy and maintain
CostFree (included in Temps)$30+/million requests or monthly SaaS fee
Data residencyYour server, your dataTheir servers (SaaS) or yours (self-hosted)
IntegrationSame auth, same dashboard as deploymentsSeparate auth system, separate dashboard
MonitoringBuilt-in OTel traces + usage analyticsUsually one or the other
Key managementAES-256-GCM encrypted, same vault as env varsSeparate secret management

The real value isn't the proxy itself — it's that the gateway lives alongside your deployments, analytics, error tracking, and monitoring. One platform, one login, one bill.


OpenTelemetry Integration: Trace Every AI Call

Temps also includes a full OpenTelemetry collector. Combine the AI Gateway with OTel GenAI semantic conventions and you get end-to-end traces of every AI interaction:

  • Which user triggered the request
  • What the input and output messages were
  • How long the model took to respond
  • Token usage and estimated cost
  • Thinking blocks and tool calls (for Anthropic)

The AI Activity tab in your project dashboard shows these traces in a conversation view — system prompt, user messages, assistant responses, tool calls, and even reasoning/thinking blocks rendered inline.

# Your app code — OTel auto-instruments the gateway calls
from opentelemetry import trace

tracer = trace.get_tracer("my-ai-app")

with tracer.start_as_current_span("chat claude-sonnet-4-6"):
    response = client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[{"role": "user", "content": "..."}],
    )

This trace shows up in the AI Activity dashboard with the full conversation, token counts, and latency — all without any additional infrastructure.


Setting It Up: 3 Steps

1. Configure Provider Keys

In the Temps dashboard, go to AI Gateway and add your provider API keys. Paste your Anthropic, OpenAI, Gemini, or xAI key — each one is encrypted with AES-256-GCM before hitting the database.

No CLI command needed. The dashboard validates the key format and tests connectivity before saving.

2. Get Your Temps API Key

Use your existing Temps API key — the same one you use for deployments and the CLI:

temps api-key --name "ai-gateway" --role admin
# → tk_abc123...

3. Point Your Code at the Gateway

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "tk_abc123",
  baseURL: "https://your-temps.example.com/api/ai/v1",
});

// Works with any supported model
const anthropic = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Hello" }],
});

const openai = await client.chat.completions.create({
  model: "gpt-4.1-nano",
  messages: [{ role: "user", content: "Hello" }],
});

const gemini = await client.chat.completions.create({
  model: "gemini-2.5-flash",
  messages: [{ role: "user", content: "Hello" }],
});

That's it. Four providers, one endpoint, one API key.


What Teams Are Building With This

AI-powered support bots — Route customer queries to the cheapest model that meets quality thresholds. Track cost per conversation.

Code review pipelines — Use Claude for code analysis, GPT for documentation generation, Gemini for test generation. See which step costs the most.

RAG applications — Embeddings through OpenAI, retrieval-augmented generation through Anthropic. One endpoint, unified cost tracking.

Multi-model comparison — A/B test models by sending identical requests to different providers and comparing quality vs cost vs latency.

Internal AI tools — Give every team access to AI without sharing raw API keys. Track usage per team via tags. Set up RBAC with Read/Execute/Admin permissions.


Frequently Asked Questions

Does the gateway add latency to my requests?

The gateway overhead is under 10ms — including authentication, request translation, and response normalization. The bottleneck is always the upstream provider (typically 200-2000ms), not the proxy.

Can I use the gateway without storing my API keys in Temps?

Yes. Use the x-provider-api-key header to pass your key per-request. It's used for that request only and never stored. This is called BYOK (Bring Your Own Key).

Does streaming work through the gateway?

Yes. All four providers support streaming through the same /chat/completions endpoint with "stream": true. The gateway translates SSE chunks between provider formats transparently.

How is this billed?

The AI Gateway is included in Temps at no extra charge. You pay your provider API costs directly (OpenAI, Anthropic, etc.) and Temps adds zero markup. If you're on Temps Cloud, it's included in your $6/mo plan.

Can I see the actual conversation messages in traces?

Yes. If you instrument your app with OpenTelemetry GenAI semantic conventions, the AI Activity tab shows full conversation threads — system prompts, user messages, assistant responses, tool calls, and thinking blocks.


Start Using It Today

The AI Gateway ships with the latest Temps release. If you're already running Temps, update to the latest version and configure your provider keys.

If you're new to Temps, get started in under 5 minutes:

curl -fsSL https://get.temps.sh | sh

One binary. Deployments, analytics, error tracking, monitoring, and now an AI gateway. No SaaS sprawl required.

Links:

#ai-gateway#openai#anthropic#gemini#grok#llm#ai-proxy#multi-provider#ai-cost-tracking#self-hosted