Temps

Self-Hosted AI Gateway: Route OpenAI, Anthropic & Gemini Through One API

March 11, 2026 (1mo ago)

Written by Temps Team

Last updated March 11, 2026 (1mo ago)

Back to all posts

Temps

Self-Hosted AI Gateway: Route OpenAI, Anthropic & Gemini Through One API

March 11, 2026 (1mo ago)

Written by Temps Team

Last updated March 11, 2026 (1mo ago)

Your team uses Claude for code review, GPT-4.1 for customer support, Gemini for document processing, and Grok for internal chat. That's four API keys, four billing dashboards, four sets of rate limits, and zero unified view of what you're spending.

According to Kong, 72% of enterprises plan to increase GenAI spending in 2025. The average monthly AI spend already hits $85,521 according to CloudZero. And most teams can't even tell you which model is burning through their budget.

Today we're releasing the Temps AI Gateway — a built-in, OpenAI-compatible proxy that routes requests to Anthropic, OpenAI, Google Gemini, and xAI through a single endpoint. With usage tracking, cost analytics, and per-request attribution baked in.

No extra service to deploy. No new bill. It's part of the same binary that handles your deployments.

TL;DR: Temps now includes a built-in AI Gateway that proxies OpenAI, Anthropic, Gemini, and Grok through one OpenAI-compatible endpoint. Track costs per model, per user, per conversation — with zero additional infrastructure. According to Kong, 72% of enterprises are increasing AI spend in 2025, and most have no idea where the money goes.

Why Do You Need an AI Gateway?

According to the Stanford HAI AI Index, the cost of querying a GPT-3.5-level model dropped from $20 to $0.07 per million tokens in 18 months — a 280-fold reduction. Sounds cheap, right? But token costs aren't the problem anymore. Operational sprawl is.

Here's what happens without a gateway:

4+ API keys scattered across env vars, Vault secrets, and team Slack DMs
4+ billing portals with different invoicing cycles and currency formats
No cross-provider cost view — you can't answer "how much did our AI features cost last month?"
No per-user or per-feature attribution — was it the chatbot or the code reviewer that burned $3,000?
Key rotation nightmares — changing one key means updating 12 services

A gateway gives you one endpoint, one set of credentials, and one dashboard. Point your code at https://your-temps.example.com/api/ai/v1/chat/completions and it routes to the right provider based on the model name.

How the Temps AI Gateway Works

The gateway exposes three OpenAI-compatible endpoints:

POST /api/ai/v1/chat/completions   → Chat (all providers)
POST /api/ai/v1/embeddings         → Embeddings (OpenAI)
GET  /api/ai/v1/models             → List available models

Every request flows through this pipeline:

Authentication — Bearer token from your Temps API key
Model routing — claude-sonnet-4-6 goes to Anthropic, gpt-4.1-nano goes to OpenAI, gemini-2.5-flash goes to Google
Request translation — Your OpenAI-format request gets translated to each provider's native API format
Upstream call — The gateway makes the actual API call with the encrypted provider key
Response translation — Provider response gets normalized back to OpenAI format
Usage logging — Tokens, latency, cost, and model are recorded to a TimescaleDB hypertable

The total gateway overhead? Under 10ms. The bottleneck is always the upstream provider, never the proxy.

Supported Providers and Models

Provider	Models	Streaming
Anthropic	Claude Opus 4.6, Sonnet 4.6, Haiku 4.5	Yes
OpenAI	GPT-5.4, GPT-4.1, o3, o4-mini, GPT-4o, embeddings	Yes
Google	Gemini 3.1 Pro, 2.5 Flash/Pro, 2.0 Flash	Yes
xAI	Grok 4.1 Fast, Grok 3, Grok 3 Mini	Yes

All models, including streaming responses, work through the same /chat/completions endpoint. The gateway handles SSE chunk translation transparently — Anthropic's message_delta events get normalized to OpenAI's delta format.

One-Line Integration: Just Change the Base URL

If your code already calls OpenAI, switching to the Temps gateway takes one line:

import openai

client = openai.OpenAI(
    api_key="tk_your_temps_api_key",
    base_url="https://your-temps.example.com/api/ai/v1",
)

# This routes to Anthropic automatically
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Explain TCP vs UDP"}],
)

Same code, same SDK, same types. But now your request goes through Temps, gets logged, and you can see it in your dashboard.

This works with any OpenAI-compatible SDK — Python, TypeScript, Go, Rust, curl. If it speaks OpenAI's API format, it works.

Bring Your Own Key (BYOK)

Don't want to store provider keys in Temps? Pass them per-request:

curl https://your-temps.example.com/api/ai/v1/chat/completions \
  -H "Authorization: Bearer tk_your_temps_key" \
  -H "x-provider-api-key: sk-ant-your-anthropic-key" \
  -d '{"model": "claude-sonnet-4-6", "messages": [...]}'

BYOK keys are ephemeral — they're used for the request and never stored. System-configured keys are encrypted at rest with AES-256-GCM.

Usage Analytics: Know Where Every Token Goes

Every request through the gateway gets logged to a TimescaleDB hypertable with 15 columns of metadata:

Provider and model — which model handled this request
Token counts — input and output tokens
Latency — end-to-end response time in milliseconds
Cost — estimated cost calculated at 1/10,000th cent precision
Conversation ID — group multi-turn conversations together
Tags — arbitrary labels like team:platform, feature:chatbot, env:staging
BYOK flag — was this a system key or a user-provided key?

The dashboard breaks this down into:

Summary cards — total requests, tokens, cost, average latency, error rate

Timeseries charts — requests and tokens over time with hourly/daily bucketing

Per-model breakdown — which models cost the most, which are fastest

Per-provider view — compare Anthropic vs OpenAI vs Gemini at a glance

Conversation analytics — group requests by x-conversation-id header to see full conversation costs

Tag-Based Cost Allocation

Pass tags via the x-tags header to attribute costs to teams, features, or environments:

curl ... \
  -H "x-tags: team:ml-ops, feature:code-review, env:production" \
  -d '{"model": "claude-sonnet-4-6", ...}'

Then filter your analytics by any tag combination. Finally answer: "How much does the code review feature cost us per month?"

How Is This Different from Portkey, Helicone, or LiteLLM?

Standalone AI gateways like Portkey, Helicone, and LiteLLM are good products. But they're another service to deploy, another bill to pay, and another thing to monitor.

According to TrueFoundry, Kong's AI gateway costs over $30 per million requests. LiteLLM is open source but has no commercial backing, frequent regressions, and significant latency overhead. Portkey and Helicone are SaaS — your requests route through their servers.

The Temps AI Gateway is different:

	Temps AI Gateway	Standalone gateways
Deployment	Already running — it's part of your Temps binary	Separate service to deploy and maintain
Cost	Free (included in Temps)	$30+/million requests or monthly SaaS fee
Data residency	Your server, your data	Their servers (SaaS) or yours (self-hosted)
Integration	Same auth, same dashboard as deployments	Separate auth system, separate dashboard
Monitoring	Built-in OTel traces + usage analytics	Usually one or the other
Key management	AES-256-GCM encrypted, same vault as env vars	Separate secret management

The real value isn't the proxy itself — it's that the gateway lives alongside your deployments, analytics, error tracking, and monitoring. One platform, one login, one bill.

OpenTelemetry Integration: Trace Every AI Call

Temps also includes a full OpenTelemetry collector. Combine the AI Gateway with OTel GenAI semantic conventions and you get end-to-end traces of every AI interaction:

Which user triggered the request
What the input and output messages were
How long the model took to respond
Token usage and estimated cost
Thinking blocks and tool calls (for Anthropic)

The AI Activity tab in your project dashboard shows these traces in a conversation view — system prompt, user messages, assistant responses, tool calls, and even reasoning/thinking blocks rendered inline.

# Your app code — OTel auto-instruments the gateway calls
from opentelemetry import trace

tracer = trace.get_tracer("my-ai-app")

with tracer.start_as_current_span("chat claude-sonnet-4-6"):
    response = client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[{"role": "user", "content": "..."}],
    )

This trace shows up in the AI Activity dashboard with the full conversation, token counts, and latency — all without any additional infrastructure.

Setting It Up: 3 Steps

1. Configure Provider Keys

In the Temps dashboard, go to AI Gateway and add your provider API keys. Paste your Anthropic, OpenAI, Gemini, or xAI key — each one is encrypted with AES-256-GCM before hitting the database.

No CLI command needed. The dashboard validates the key format and tests connectivity before saving.

2. Get Your Temps API Key

Use your existing Temps API key — the same one you use for deployments and the CLI:

temps api-key --name "ai-gateway" --role admin
# → tk_abc123...

3. Point Your Code at the Gateway

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "tk_abc123",
  baseURL: "https://your-temps.example.com/api/ai/v1",
});

// Works with any supported model
const anthropic = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Hello" }],
});

const openai = await client.chat.completions.create({
  model: "gpt-4.1-nano",
  messages: [{ role: "user", content: "Hello" }],
});

const gemini = await client.chat.completions.create({
  model: "gemini-2.5-flash",
  messages: [{ role: "user", content: "Hello" }],
});

That's it. Four providers, one endpoint, one API key.

What Teams Are Building With This

AI-powered support bots — Route customer queries to the cheapest model that meets quality thresholds. Track cost per conversation.

Code review pipelines — Use Claude for code analysis, GPT for documentation generation, Gemini for test generation. See which step costs the most.

RAG applications — Embeddings through OpenAI, retrieval-augmented generation through Anthropic. One endpoint, unified cost tracking.

Multi-model comparison — A/B test models by sending identical requests to different providers and comparing quality vs cost vs latency.

Internal AI tools — Give every team access to AI without sharing raw API keys. Track usage per team via tags. Set up RBAC with Read/Execute/Admin permissions.

Frequently Asked Questions

Does the gateway add latency to my requests?

The gateway overhead is under 10ms — including authentication, request translation, and response normalization. The bottleneck is always the upstream provider (typically 200-2000ms), not the proxy.

Can I use the gateway without storing my API keys in Temps?

Yes. Use the x-provider-api-key header to pass your key per-request. It's used for that request only and never stored. This is called BYOK (Bring Your Own Key).

Does streaming work through the gateway?

Yes. All four providers support streaming through the same /chat/completions endpoint with "stream": true. The gateway translates SSE chunks between provider formats transparently.

How is this billed?

The AI Gateway is included in Temps at no extra charge. You pay your provider API costs directly (OpenAI, Anthropic, etc.) and Temps adds zero markup. If you're on Temps Cloud, it's included in your $6/mo plan.

Can I see the actual conversation messages in traces?

Yes. If you instrument your app with OpenTelemetry GenAI semantic conventions, the AI Activity tab shows full conversation threads — system prompts, user messages, assistant responses, tool calls, and thinking blocks.

Start Using It Today

The AI Gateway ships with the latest Temps release. If you're already running Temps, update to the latest version and configure your provider keys.

If you're new to Temps, get started in under 5 minutes:

curl -fsSL https://get.temps.sh | sh

One binary. Deployments, analytics, error tracking, monitoring, and now an AI gateway. No SaaS sprawl required.

Links:

Back to all posts

No extra service to deploy. No new bill. It's part of the same binary that handles your deployments.

TL;DR: Temps now includes a built-in AI Gateway that proxies OpenAI, Anthropic, Gemini, and Grok through one OpenAI-compatible endpoint. Track costs per model, per user, per conversation — with zero additional infrastructure. According to Kong, 72% of enterprises are increasing AI spend in 2025, and most have no idea where the money goes.

Why Do You Need an AI Gateway?

Here's what happens without a gateway:

4+ API keys scattered across env vars, Vault secrets, and team Slack DMs
4+ billing portals with different invoicing cycles and currency formats
No cross-provider cost view — you can't answer "how much did our AI features cost last month?"
No per-user or per-feature attribution — was it the chatbot or the code reviewer that burned $3,000?
Key rotation nightmares — changing one key means updating 12 services

How the Temps AI Gateway Works

The gateway exposes three OpenAI-compatible endpoints:

POST /api/ai/v1/chat/completions   → Chat (all providers)
POST /api/ai/v1/embeddings         → Embeddings (OpenAI)
GET  /api/ai/v1/models             → List available models

Every request flows through this pipeline:

Authentication — Bearer token from your Temps API key
Model routing — claude-sonnet-4-6 goes to Anthropic, gpt-4.1-nano goes to OpenAI, gemini-2.5-flash goes to Google
Request translation — Your OpenAI-format request gets translated to each provider's native API format
Upstream call — The gateway makes the actual API call with the encrypted provider key
Response translation — Provider response gets normalized back to OpenAI format
Usage logging — Tokens, latency, cost, and model are recorded to a TimescaleDB hypertable

The total gateway overhead? Under 10ms. The bottleneck is always the upstream provider, never the proxy.

Supported Providers and Models

Provider	Models	Streaming
Anthropic	Claude Opus 4.6, Sonnet 4.6, Haiku 4.5	Yes
OpenAI	GPT-5.4, GPT-4.1, o3, o4-mini, GPT-4o, embeddings	Yes
Google	Gemini 3.1 Pro, 2.5 Flash/Pro, 2.0 Flash	Yes
xAI	Grok 4.1 Fast, Grok 3, Grok 3 Mini	Yes

One-Line Integration: Just Change the Base URL

If your code already calls OpenAI, switching to the Temps gateway takes one line:

import openai

client = openai.OpenAI(
    api_key="tk_your_temps_api_key",
    base_url="https://your-temps.example.com/api/ai/v1",
)

# This routes to Anthropic automatically
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Explain TCP vs UDP"}],
)

Same code, same SDK, same types. But now your request goes through Temps, gets logged, and you can see it in your dashboard.

This works with any OpenAI-compatible SDK — Python, TypeScript, Go, Rust, curl. If it speaks OpenAI's API format, it works.

Bring Your Own Key (BYOK)

Don't want to store provider keys in Temps? Pass them per-request:

curl https://your-temps.example.com/api/ai/v1/chat/completions \
  -H "Authorization: Bearer tk_your_temps_key" \
  -H "x-provider-api-key: sk-ant-your-anthropic-key" \
  -d '{"model": "claude-sonnet-4-6", "messages": [...]}'

BYOK keys are ephemeral — they're used for the request and never stored. System-configured keys are encrypted at rest with AES-256-GCM.

Usage Analytics: Know Where Every Token Goes

Every request through the gateway gets logged to a TimescaleDB hypertable with 15 columns of metadata:

Provider and model — which model handled this request
Token counts — input and output tokens
Latency — end-to-end response time in milliseconds
Cost — estimated cost calculated at 1/10,000th cent precision
Conversation ID — group multi-turn conversations together
Tags — arbitrary labels like team:platform, feature:chatbot, env:staging
BYOK flag — was this a system key or a user-provided key?

The dashboard breaks this down into:

Summary cards — total requests, tokens, cost, average latency, error rate

Timeseries charts — requests and tokens over time with hourly/daily bucketing

Per-model breakdown — which models cost the most, which are fastest

Per-provider view — compare Anthropic vs OpenAI vs Gemini at a glance

Conversation analytics — group requests by x-conversation-id header to see full conversation costs

Tag-Based Cost Allocation

Pass tags via the x-tags header to attribute costs to teams, features, or environments:

curl ... \
  -H "x-tags: team:ml-ops, feature:code-review, env:production" \
  -d '{"model": "claude-sonnet-4-6", ...}'

Then filter your analytics by any tag combination. Finally answer: "How much does the code review feature cost us per month?"

How Is This Different from Portkey, Helicone, or LiteLLM?

Standalone AI gateways like Portkey, Helicone, and LiteLLM are good products. But they're another service to deploy, another bill to pay, and another thing to monitor.

The Temps AI Gateway is different:

	Temps AI Gateway	Standalone gateways
Deployment	Already running — it's part of your Temps binary	Separate service to deploy and maintain
Cost	Free (included in Temps)	$30+/million requests or monthly SaaS fee
Data residency	Your server, your data	Their servers (SaaS) or yours (self-hosted)
Integration	Same auth, same dashboard as deployments	Separate auth system, separate dashboard
Monitoring	Built-in OTel traces + usage analytics	Usually one or the other
Key management	AES-256-GCM encrypted, same vault as env vars	Separate secret management

The real value isn't the proxy itself — it's that the gateway lives alongside your deployments, analytics, error tracking, and monitoring. One platform, one login, one bill.

OpenTelemetry Integration: Trace Every AI Call

Temps also includes a full OpenTelemetry collector. Combine the AI Gateway with OTel GenAI semantic conventions and you get end-to-end traces of every AI interaction:

Which user triggered the request
What the input and output messages were
How long the model took to respond
Token usage and estimated cost
Thinking blocks and tool calls (for Anthropic)

# Your app code — OTel auto-instruments the gateway calls
from opentelemetry import trace

tracer = trace.get_tracer("my-ai-app")

with tracer.start_as_current_span("chat claude-sonnet-4-6"):
    response = client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[{"role": "user", "content": "..."}],
    )

This trace shows up in the AI Activity dashboard with the full conversation, token counts, and latency — all without any additional infrastructure.

Setting It Up: 3 Steps

1. Configure Provider Keys

In the Temps dashboard, go to AI Gateway and add your provider API keys. Paste your Anthropic, OpenAI, Gemini, or xAI key — each one is encrypted with AES-256-GCM before hitting the database.

No CLI command needed. The dashboard validates the key format and tests connectivity before saving.

2. Get Your Temps API Key

Use your existing Temps API key — the same one you use for deployments and the CLI:

temps api-key --name "ai-gateway" --role admin
# → tk_abc123...

3. Point Your Code at the Gateway

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "tk_abc123",
  baseURL: "https://your-temps.example.com/api/ai/v1",
});

// Works with any supported model
const anthropic = await client.chat.completions.create({
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Hello" }],
});

const openai = await client.chat.completions.create({
  model: "gpt-4.1-nano",
  messages: [{ role: "user", content: "Hello" }],
});

const gemini = await client.chat.completions.create({
  model: "gemini-2.5-flash",
  messages: [{ role: "user", content: "Hello" }],
});

That's it. Four providers, one endpoint, one API key.

What Teams Are Building With This

AI-powered support bots — Route customer queries to the cheapest model that meets quality thresholds. Track cost per conversation.

Code review pipelines — Use Claude for code analysis, GPT for documentation generation, Gemini for test generation. See which step costs the most.

RAG applications — Embeddings through OpenAI, retrieval-augmented generation through Anthropic. One endpoint, unified cost tracking.

Multi-model comparison — A/B test models by sending identical requests to different providers and comparing quality vs cost vs latency.

Internal AI tools — Give every team access to AI without sharing raw API keys. Track usage per team via tags. Set up RBAC with Read/Execute/Admin permissions.

Frequently Asked Questions

Does the gateway add latency to my requests?

The gateway overhead is under 10ms — including authentication, request translation, and response normalization. The bottleneck is always the upstream provider (typically 200-2000ms), not the proxy.

Can I use the gateway without storing my API keys in Temps?

Yes. Use the x-provider-api-key header to pass your key per-request. It's used for that request only and never stored. This is called BYOK (Bring Your Own Key).

Does streaming work through the gateway?

Yes. All four providers support streaming through the same /chat/completions endpoint with "stream": true. The gateway translates SSE chunks between provider formats transparently.

How is this billed?

Can I see the actual conversation messages in traces?

Start Using It Today

The AI Gateway ships with the latest Temps release. If you're already running Temps, update to the latest version and configure your provider keys.

If you're new to Temps, get started in under 5 minutes:

curl -fsSL https://get.temps.sh | sh

One binary. Deployments, analytics, error tracking, monitoring, and now an AI gateway. No SaaS sprawl required.

Links:

Self-Hosted AI Gateway: Route OpenAI, Anthropic & Gemini Through One API | Temps