AI Gateway
The AI Gateway is an OpenAI-compatible proxy built into Temps. Use any OpenAI SDK to call OpenAI, Anthropic, Gemini, and xAI — just change the base URL.
What is the AI gateway
The AI Gateway is a Temps plugin (ai_gateway) that exposes an OpenAI-compatible HTTP API in front of multiple LLM providers. You point any OpenAI SDK at your Temps instance and it routes the request to the right upstream — OpenAI, Anthropic, Gemini, or xAI — translating the wire format automatically. Provider API keys are stored encrypted on the server; developers who consume the gateway only need a Temps API key.
All gateway routes are served under the /api prefix. The OpenAI-compatible base URL is https://your-temps-instance.com/api/ai/v1. Endpoint paths throughout this page include the /api prefix exactly as served.
The gateway covers:
- Chat completions, streaming, tool calling, and vision across all four providers
- Embeddings (OpenAI models)
- Model listing
- BYOK per-request key override
- Conversation and request tagging
- Token usage and cost analytics
Authentication & permissions
Every gateway route runs on Temps' authenticated surface. Each request must carry a valid Temps API key in the Authorization: Bearer <temps-api-key> header. Beyond a valid key, individual endpoints require a specific permission on the key:
| Permission | Grants access to |
|---|---|
ai_gateway:execute | POST /api/ai/v1/chat/completions, POST /api/ai/v1/embeddings |
ai_gateway:read | GET /api/ai/v1/models, all GET /api/ai/usage/*, GET /api/ai/pricing, GET /api/ai/providers, and POST /api/ai/providers/{id}/test |
ai_gateway:write | POST/PATCH/DELETE /api/ai/providers, POST /api/ai/providers/test |
Requests reaching the gateway endpoints with a key that lacks the required permission are rejected by the permission guard. Sending a chat completion or embedding request needs ai_gateway:execute; listing models needs only ai_gateway:read.
Quick start
Point any OpenAI SDK at your Temps instance. The only changes are base_url and api_key.
Setup
from openai import OpenAI
client = OpenAI(
base_url="https://your-temps-instance.com/api/ai/v1",
api_key="your-temps-api-key",
)
response = client.chat.completions.create(
model="gpt-4o", # or "claude-sonnet-4-20250514", "gemini-2.5-pro", etc.
messages=[
{"role": "user", "content": "Hello!"}
],
)
print(response.choices[0].message.content)
Supported providers
The gateway selects a provider from the model-name prefix. Four providers are wired up:
| Model Prefix | Provider ID | Examples |
|---|---|---|
gpt-, o1, o3, o4, text-embedding-, dall-e, chatgpt-, codex- | openai | gpt-4o, o3-mini, text-embedding-3-small |
claude- | anthropic | claude-sonnet-4-20250514, claude-haiku-4-5-20251001 |
grok- | xai | grok-3, grok-3-mini |
gemini- | gemini | gemini-2.5-pro, gemini-2.5-flash |
All providers use the same OpenAI-compatible API format. xAI reuses the OpenAI-compatible adapter with its own base URL (https://api.x.ai/v1); Anthropic and Gemini have their request/response and streaming formats translated automatically. A model whose name does not match one of the prefixes above is not routable.
Chat completions
Standard chat completions work identically to the OpenAI API.
Endpoint
- Name
POST /api/ai/v1/chat/completions- Type
- endpoint
- Description
Submit a chat completion request. Supports streaming, tool calling, vision, and JSON mode.
Endpoint
POST /api/ai/v1/chat/completions
Authorization: Bearer <temps-api-key>
Content-Type: application/json
Multi-turn conversation
response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to calculate fibonacci numbers."},
{"role": "assistant", "content": "Here's a fibonacci function:\n\n```python\ndef fib(n):\n if n <= 1:\n return n\n return fib(n-1) + fib(n-2)\n```"},
{"role": "user", "content": "Can you make it iterative for better performance?"},
],
temperature=0.7,
max_tokens=500,
)
Sampling parameters
The gateway forwards the standard OpenAI sampling parameters to each provider. The two most common are temperature and top_p:
- Name
temperature- Type
- number
- Description
Controls randomness. Lower values (near
0) make output more focused and deterministic — the model almost always picks the most likely next token, which is best for factual answers, code, and structured extraction. Higher values (near1) make output more varied and creative. A value of0.7is a common middle ground.
- Name
top_p- Type
- number
- Description
Nucleus sampling. Restricts token selection to the smallest set of tokens whose cumulative probability is at least
top_p. For example,top_p: 0.1only ever considers the top 10% most-likely tokens, narrowing diversity at the token level.1.0(the effective default) considers all tokens.
Adjust either temperature or top_p, but generally not both at once — tuning both together makes the sampling behavior hard to reason about. Both are optional; omit them to use each provider's defaults.
See Supported Providers for which models route to which provider. For structured output (JSON mode), include response_format: {'{type: "json_object"}'} in the request — the gateway translates it to each provider's native equivalent (for Gemini, response_mime_type: application/json).
Vision & image attachments
Send images to vision-capable models. The gateway translates image formats automatically for each provider.
Base64 Image
Send base64 image
import base64
# Read image and encode to base64
with open("screenshot.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="gpt-4o", # or claude-sonnet-4-20250514, gemini-2.5-pro
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image? Describe it in detail."},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{image_data}"
},
},
],
}
],
max_tokens=1000,
)
print(response.choices[0].message.content)
Image from URL
Image URL
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what you see in this image."},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg"
},
},
],
}
],
)
Multiple Images
Multiple images
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Compare these two screenshots. What changed?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{before_image}"},
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{after_image}"},
},
],
}
],
)
Supported image formats: PNG, JPEG, GIF, WebP. Base64 data URIs work with all providers. HTTP URLs work with OpenAI and Gemini; Anthropic requires base64.
Tool calling (function calling)
Define tools using the OpenAI format. The gateway translates them to each provider's native format.
Tool calling
import json
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'London'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
}
]
# Step 1: Send message with tools
response = client.chat.completions.create(
model="claude-sonnet-4-20250514", # works with any provider
messages=[
{"role": "user", "content": "What's the weather like in London?"}
],
tools=tools,
)
message = response.choices[0].message
# Step 2: Check if the model wants to call a tool
if message.tool_calls:
tool_call = message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print(f"Model wants to call: {tool_call.function.name}({args})")
# Step 3: Execute the tool and send the result back
weather_result = get_weather(args["location"]) # your function
follow_up = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[
{"role": "user", "content": "What's the weather like in London?"},
message, # assistant message with tool_calls
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(weather_result),
},
],
tools=tools,
)
print(follow_up.choices[0].message.content)
Tool Choice
Control whether the model should call tools:
Tool choice options
# Let the model decide (default)
tool_choice="auto"
# Force the model to call a tool
tool_choice="required"
# Force a specific tool
tool_choice={"type": "function", "function": {"name": "get_weather"}}
# Don't use tools (even if provided)
tool_choice="none"
Streaming
When you set stream: true, the response is returned as Server-Sent Events with content-type: text/event-stream. All providers produce the same OpenAI-style data: {...}\n\n chunks (each with object chat.completion.chunk), terminated by a final data: [DONE]\n\n. Non-OpenAI providers such as Anthropic have their native streaming format translated into this OpenAI SSE shape, so client code is identical across providers.
Streaming
stream = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[
{"role": "user", "content": "Write a haiku about deployment automation."}
],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print() # newline at end
Errors that occur before the stream starts (e.g. the upstream provider rejects the request, or no key is configured) come back as a normal JSON error response with a 4xx/5xx status, just like a non-streaming call. Once the 200 text/event-stream response has begun, a mid-stream failure (such as a dropped upstream connection) terminates the SSE body abruptly without a final data: [DONE] sentinel — the HTTP status is already 200 and cannot change. Temps does not buffer partial completions, so your client should retry the full request.
Embeddings
Generate embeddings using the standard OpenAI embeddings endpoint.
Endpoint
- Name
POST /api/ai/v1/embeddings- Type
- endpoint
- Description
Generate vector embeddings for text input.
Endpoint
POST /api/ai/v1/embeddings
Authorization: Bearer <temps-api-key>
Content-Type: application/json
Embeddings
response = client.embeddings.create(
model="text-embedding-3-small",
input="The quick brown fox jumps over the lazy dog",
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")
Embeddings are currently supported for OpenAI models only. Anthropic and Gemini embedding support is planned.
List models
Retrieve the list of models available through the gateway, including all configured providers.
Endpoint
- Name
GET /api/ai/v1/models- Type
- endpoint
- Description
Returns an OpenAI-compatible model list response with all models available through configured provider keys.
List models
curl https://your-temps-instance.com/api/ai/v1/models \
-H "Authorization: Bearer your-temps-api-key"
Response
{
"object": "list",
"data": [
{ "id": "gpt-4o", "object": "model", "owned_by": "openai" },
{ "id": "claude-sonnet-4-20250514", "object": "model", "owned_by": "anthropic" },
{ "id": "gemini-2.5-pro", "object": "model", "owned_by": "google" }
]
}
BYOK (bring your own key)
Override the gateway's stored provider keys on a per-request basis by passing your own API key in request headers. Useful for multi-tenant scenarios or testing a new key without saving it.
| Header | Description |
|---|---|
x-provider-api-key | Raw API key to use for this request instead of the stored key. Presence of this header is what triggers BYOK mode. |
x-provider-base-url | Optional custom upstream base URL. Only takes effect when x-provider-api-key is also supplied — a base URL on its own is ignored. |
When a BYOK API key is present, the gateway skips the stored-key lookup entirely and routes the request with the caller-supplied key (and base URL, if given).
BYOK request
curl https://your-temps-instance.com/api/ai/v1/chat/completions \
-H "Authorization: Bearer your-temps-api-key" \
-H "x-provider-api-key: sk-your-own-openai-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Custom base URL (BYOK with upstream override)
curl https://your-temps-instance.com/api/ai/v1/chat/completions \
-H "Authorization: Bearer your-temps-api-key" \
-H "x-provider-api-key: sk-your-own-openai-key" \
-H "x-provider-base-url: https://your-proxy.example.com/v1" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
BYOK requests are tracked separately in analytics with a byok credential type so you can distinguish them from system-key requests.
SSRF protection on x-provider-base-url
When a BYOK base URL is supplied, it is validated through temps_core::url_validation::validate_external_url before any upstream request is made. The validator rejects:
- Loopback addresses (
127.0.0.0/8,::1,localhost) - Link-local ranges (
169.254.0.0/16,fe80::/10) - Private ranges (
10.0.0.0/8,192.168.0.0/16,172.16.0.0/12) - The cloud metadata endpoint (
169.254.169.254) - Non-HTTP(S) schemes such as
file://andftp://
A rejected URL returns HTTP 400 with the body Invalid X-Provider-Base-URL: {reason} and error code invalid_provider_url.
Because the validator blocks loopback and private addresses, you cannot point x-provider-base-url at a host like http://localhost:11434/v1. Use a publicly resolvable HTTPS endpoint. This validation runs only on BYOK requests (i.e. when x-provider-api-key is also present).
BYOK credentials are also scrubbed from error logs: the gateway calls .without_url() on reqwest errors before stringifying them, so an API key embedded in a request URL never leaks into logs.
Conversation & request tracking
Attach metadata to requests for grouping, filtering, and tracing in the usage dashboard. All headers are optional.
| Header | Description |
|---|---|
x-conversation-id | Group multiple requests into a logical conversation thread |
x-tags | Comma-separated tags (e.g. production,feature-x) for filtering |
x-request-id | Your own request identifier for correlation |
traceparent | W3C trace context header for distributed tracing |
Python — tracking headers
from openai import OpenAI
import httpx
# Pass extra headers via the OpenAI client
client = OpenAI(
base_url="https://your-temps-instance.com/api/ai/v1",
api_key="your-temps-api-key",
default_headers={
"x-conversation-id": "conv-abc123",
"x-tags": "production,chat-feature",
},
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
cURL — tracking headers
curl https://your-temps-instance.com/api/ai/v1/chat/completions \
-H "Authorization: Bearer your-temps-api-key" \
-H "x-conversation-id: conv-abc123" \
-H "x-tags: production,chat-feature" \
-H "x-request-id: req-xyz789" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Provider key management
Manage provider API keys through the admin API. Keys are encrypted at rest using AES-256-GCM. Developers only need a Temps API key — provider keys are managed centrally by administrators.
Listing keys and testing a stored key by ID require the ai_gateway:read permission; creating, updating, deleting, and testing an inline (unsaved) key require ai_gateway:write.
List Provider Keys
- Name
GET /api/ai/providers- Type
- endpoint
- Description
Returns all configured provider keys. The
api_key_maskedfield shows only the last 4 characters.
List keys
curl https://your-temps-instance.com/api/ai/providers \
-H "Authorization: Bearer your-admin-api-key"
Response
[
{
"id": 1,
"provider": "openai",
"display_name": "Production OpenAI Key",
"api_key_masked": "...xYZ9",
"base_url": null,
"is_active": true,
"created_at": "2025-01-15T10:00:00Z",
"updated_at": "2025-01-15T10:00:00Z"
}
]
Create a Provider Key
- Name
POST /api/ai/providers- Type
- endpoint
- Description
Add a new provider API key. The
providerfield accepts:openai,anthropic,xai,gemini.
Request body
- Name
provider- Type
- string
- Description
Provider ID:
openai,anthropic,xai, orgemini.
- Name
display_name- Type
- string
- Description
Human-readable label for the key.
- Name
api_key- Type
- string
- Description
The raw provider API key. Stored encrypted.
- Name
base_url- Type
- string
- Description
Optional custom base URL for self-hosted or proxy endpoints.
Add OpenAI key
curl -X POST https://your-temps-instance.com/api/ai/providers \
-H "Authorization: Bearer your-admin-api-key" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"display_name": "Production OpenAI Key",
"api_key": "sk-..."
}'
Add Anthropic key
curl -X POST https://your-temps-instance.com/api/ai/providers \
-H "Authorization: Bearer your-admin-api-key" \
-H "Content-Type: application/json" \
-d '{
"provider": "anthropic",
"display_name": "Production Anthropic Key",
"api_key": "sk-ant-..."
}'
Update a Provider Key
- Name
PATCH /api/ai/providers/{id}- Type
- endpoint
- Description
Update an existing provider key. All fields are optional; only provided fields are updated.
Request body
- Name
display_name- Type
- string
- Description
New display name.
- Name
api_key- Type
- string
- Description
Replace the stored API key.
- Name
base_url- Type
- string
- Description
Update the custom base URL.
- Name
is_active- Type
- boolean
- Description
Enable or disable the key without deleting it.
Disable a key
curl -X PATCH https://your-temps-instance.com/api/ai/providers/1 \
-H "Authorization: Bearer your-admin-api-key" \
-H "Content-Type: application/json" \
-d '{"is_active": false}'
Delete a Provider Key
- Name
DELETE /api/ai/providers/{id}- Type
- endpoint
- Description
Permanently remove a provider key. Returns 204 No Content on success.
Delete a key
curl -X DELETE https://your-temps-instance.com/api/ai/providers/1 \
-H "Authorization: Bearer your-admin-api-key"
Test a Provider Key
Validate that a key works before saving it, or test an already-saved key by ID.
Test provider key
curl -X POST https://your-temps-instance.com/api/ai/providers/test \
-H "Authorization: Bearer your-admin-api-key" \
-H "Content-Type: application/json" \
-d '{
"provider": "anthropic",
"api_key": "sk-ant-..."
}'
Test response
{
"success": true,
"provider": "anthropic",
"latency_ms": 342,
"error": null
}
Usage & analytics
Query token usage, costs, and request history. All time-range parameters use ISO 8601 format and default to the last 24 hours.
Usage Summary
- Name
GET /api/ai/usage/summary- Type
- endpoint
- Description
Aggregate token counts and cost for a time range.
Query parameters
- Name
from- Type
- string
- Description
ISO 8601 start time. Defaults to 24h ago.
- Name
to- Type
- string
- Description
ISO 8601 end time. Defaults to now.
Usage summary
curl "https://your-temps-instance.com/api/ai/usage/summary?from=2025-01-01T00:00:00Z&to=2025-01-08T00:00:00Z" \
-H "Authorization: Bearer your-admin-api-key"
Time-Series Usage
- Name
GET /api/ai/usage/timeseries- Type
- endpoint
- Description
Token usage bucketed by hour, day, or week.
Query parameters
- Name
from- Type
- string
- Description
- ISO 8601 start time.
- Name
to- Type
- string
- Description
- ISO 8601 end time.
- Name
bucket- Type
- string
- Description
Bucket size:
hour,day, orweek. Defaults today.
Daily timeseries
curl "https://your-temps-instance.com/api/ai/usage/timeseries?bucket=day&from=2025-01-01T00:00:00Z" \
-H "Authorization: Bearer your-admin-api-key"
Top Models
- Name
GET /api/ai/usage/top-models- Type
- endpoint
- Description
Ranked list of models by request count.
Query parameters
- Name
from- Type
- string
- Description
- ISO 8601 start time.
- Name
to- Type
- string
- Description
- ISO 8601 end time.
- Name
limit- Type
- integer
- Description
- Max results. Defaults to 10.
Top models
curl "https://your-temps-instance.com/api/ai/usage/top-models?limit=5" \
-H "Authorization: Bearer your-admin-api-key"
Usage by Provider
- Name
GET /api/ai/usage/by-provider- Type
- endpoint
- Description
Token counts and costs broken down per provider.
By provider
curl "https://your-temps-instance.com/api/ai/usage/by-provider" \
-H "Authorization: Bearer your-admin-api-key"
Recent Requests
The recent-requests log is paginated and filterable. It returns a UsageLogPage object — an entries array (the page of log rows, ordered by timestamp descending) plus a total count of all rows matching the active filter. Pagination is offset-based via limit and offset.
- Name
GET /api/ai/usage/recent- Type
- endpoint
- Description
A page of individual AI requests with token counts, latency, status, and cost. Returns
{ entries, total }.
Query parameters
- Name
limit- Type
- integer
- Description
- Page size. Defaults to 20, clamped server-side to the range 1–50 (values above 50 are reduced to 50).
- Name
offset- Type
- integer
- Description
- Number of rows to skip. Defaults to 0.
- Name
provider- Type
- string
- Description
Filter by provider name (
openai,anthropic,xai,gemini).
- Name
status- Type
- integer
- Description
- Filter by exact HTTP status code.
- Name
model- Type
- string
- Description
- Filter by model name.
- Name
user_id- Type
- integer
- Description
- Filter by user ID.
- Name
conversation_id- Type
- string
- Description
- Filter by conversation ID.
- Name
tags- Type
- string
- Description
- Filter by tags (comma-separated, AND logic).
- Name
cost_gte / cost_gt / cost_lte / cost_lt- Type
- integer
- Description
- Cost bounds, in microcents. Each operator is a separate parameter.
- Name
tokens_gte / tokens_gt / tokens_lte / tokens_lt- Type
- integer
- Description
- Total-token bounds (input + output). Each operator is a separate parameter.
Recent requests (paginated)
curl "https://your-temps-instance.com/api/ai/usage/recent?limit=20&offset=0" \
-H "Authorization: Bearer your-admin-api-key"
Filter by provider, status, and cost
# Anthropic requests that succeeded (200) costing >= 5000 microcents
curl "https://your-temps-instance.com/api/ai/usage/recent?provider=anthropic&status=200&cost_gte=5000" \
-H "Authorization: Bearer your-admin-api-key"
UsageLogPage response
{
"entries": [
{
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"input_tokens": 842,
"output_tokens": 311,
"latency_ms": 1240,
"estimated_cost_microcents": 7184,
"status": 200,
"is_streaming": true,
"is_byok": false,
"conversation_id": "conv-abc123",
"tags": "production,chat-feature",
"request_id": "req-xyz789",
"trace_id": null
}
],
"total": 1284
}
Cost bounds are expressed in microcents (the same unit as the stored estimated_cost_microcents column). The web console's filter UI lets you enter dollars and converts them to microcents before querying. The cost, status, and total-token filters are specific to this endpoint; the summary, by-provider, time-series, top-models, and conversations endpoints take time-range and a smaller filter set (such as user_id, model, and tags).
Conversations
Group requests by conversation ID to see the full token history of a chat session.
- Name
GET /api/ai/usage/conversations- Type
- endpoint
- Description
List conversation summaries. Requires requests to have been tagged with
x-conversation-id.
Query parameters
- Name
from- Type
- string
- Description
- ISO 8601 start time.
- Name
to- Type
- string
- Description
- ISO 8601 end time.
- Name
limit- Type
- integer
- Description
- Max results. Defaults to 50, max 100.
- Name
user_id- Type
- integer
- Description
- Filter by user ID.
- Name
tags- Type
- string
- Description
- Filter by tags (comma-separated).
- Name
model- Type
- string
- Description
- Filter by model name.
List conversations
curl "https://your-temps-instance.com/api/ai/usage/conversations?tags=production" \
-H "Authorization: Bearer your-admin-api-key"
Conversation list response
[
{
"conversation_id": "conv-abc123",
"message_count": 12,
"total_input_tokens": 8432,
"total_output_tokens": 3210,
"total_tokens": 11642,
"total_cost_microcents": 58210,
"avg_latency_ms": 824.5,
"models_used": ["gpt-4o", "claude-sonnet-4-20250514"],
"first_at": "2025-01-15T09:00:00Z",
"last_at": "2025-01-15T09:45:00Z"
}
]
- Name
GET /api/ai/usage/conversations/{conversation_id}- Type
- endpoint
- Description
All individual requests within a specific conversation, in chronological order.
Query parameters
- Name
limit- Type
- integer
- Description
- Max results. Defaults to 100.
Conversation detail
curl "https://your-temps-instance.com/api/ai/usage/conversations/conv-abc123" \
-H "Authorization: Bearer your-admin-api-key"
Pricing
Retrieve the pricing table for all supported models. Prices are in USD per 1 million tokens.
- Name
GET /api/ai/pricing- Type
- endpoint
- Description
Returns pricing for all models known to the gateway, including input, output, cache, and batch rates where applicable.
Get pricing
curl https://your-temps-instance.com/api/ai/pricing \
-H "Authorization: Bearer your-temps-api-key"
Pricing response
{
"models": [
{
"model": "gpt-4o",
"display_name": "GPT-4o",
"provider": "openai",
"input_per_million": 2.50,
"output_per_million": 10.00,
"cache_hit_per_million": 1.25,
"batch_input_per_million": 1.25,
"batch_output_per_million": 5.00,
"deprecated": false
},
{
"model": "claude-sonnet-4-20250514",
"display_name": "Claude Sonnet 4.6",
"provider": "anthropic",
"input_per_million": 3.00,
"output_per_million": 15.00,
"cache_write_5m_per_million": 3.75,
"cache_hit_per_million": 0.30,
"deprecated": false
}
]
}
Pricing fields
- Name
input_per_million- Type
- number
- Description
Base input token cost per 1M tokens.
- Name
output_per_million- Type
- number
- Description
Output token cost per 1M tokens.
- Name
cache_write_5m_per_million- Type
- number
- Description
5-minute prompt cache write cost per 1M tokens (Anthropic-style caching).
- Name
cache_write_1h_per_million- Type
- number
- Description
1-hour cache write cost per 1M tokens.
- Name
cache_hit_per_million- Type
- number
- Description
Cache hit / read cost per 1M tokens.
- Name
batch_input_per_million- Type
- number
- Description
Batch API input cost per 1M tokens (OpenAI batch API).
- Name
batch_output_per_million- Type
- number
- Description
Batch API output cost per 1M tokens.