AI Gateway

OpenAI-compatible AI gateway built into Temps. Use any OpenAI SDK to call OpenAI, Anthropic, Gemini, and xAI — just change the base URL.


Overview

The AI Gateway provides a unified, OpenAI-compatible API for all major LLM providers. Your existing OpenAI SDK code works unchanged — just point it at your Temps instance.

What's Included

  • OpenAI API wire-compatible endpoints
  • Multi-provider routing (OpenAI, Anthropic, Gemini, xAI)
  • Tool calling / function calling across all providers
  • Vision / image attachments (base64 and URL)
  • SSE streaming with consistent format
  • Centralized API key management
  • BYOK (Bring Your Own Key) per-request override
  • Conversation tracking and tagging
  • Per-user usage tracking and analytics
  • JSON mode / structured output

Why It Matters

  • Zero code changes — works with any OpenAI SDK
  • Centralized key management (devs never see API keys)
  • Switch providers without changing client code
  • Built-in usage analytics and cost tracking
  • Self-hosted — your data stays on your infrastructure
  • No per-request fees beyond provider costs
  • Transparent translation for Anthropic and Gemini

Key Features

  • Name
    Drop-In Replacement
    Description

    100% OpenAI SDK compatible. Works with Python, Node.js, Go, Rust, and any other OpenAI client library.

  • Name
    Transparent Translation
    Description

    Automatically translates requests/responses for Anthropic Messages API and Gemini generateContent API.

  • Name
    Vision Support
    Description

    Send images as base64 data URIs or HTTP URLs. Translated to each provider's native format automatically.

  • Name
    Tool Calling
    Description

    OpenAI-format tools work across all providers. Function definitions, tool_choice, and tool results are translated transparently.

  • Name
    Streaming
    Description

    Server-Sent Events (SSE) streaming with consistent data: {...}\n\n format across all providers.

  • Name
    Usage Analytics
    Description

    Track token usage, costs, and request counts per user, model, and provider.


Authentication & Permissions

Every gateway route runs on Temps' authenticated surface. Each request must carry a valid Temps API key in the Authorization: Bearer <temps-api-key> header. Beyond a valid key, individual endpoints require a specific permission on the key:

PermissionGrants access to
ai_gateway:executePOST /api/ai/v1/chat/completions, POST /api/ai/v1/embeddings
ai_gateway:readGET /api/ai/v1/models, all GET /api/ai/usage/*, GET /api/ai/pricing, GET /api/ai/providers, and POST /api/ai/providers/{id}/test
ai_gateway:writePOST/PATCH/DELETE /api/ai/providers, POST /api/ai/providers/test

Quick Start

Point any OpenAI SDK at your Temps instance. The only changes are base_url and api_key.

Setup

from openai import OpenAI

client = OpenAI(
    base_url="https://your-temps-instance.com/api/ai/v1",
    api_key="your-temps-api-key",
)

response = client.chat.completions.create(
    model="gpt-4o",  # or "claude-sonnet-4-20250514", "gemini-2.5-pro", etc.
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.choices[0].message.content)

Supported Providers

The gateway selects a provider from the model-name prefix. Four providers are wired up:

Model PrefixProvider IDExamples
gpt-, o1, o3, o4, text-embedding-, dall-e, chatgpt-, codex-openaigpt-4o, o3-mini, text-embedding-3-small
claude-anthropicclaude-sonnet-4-20250514, claude-haiku-4-5-20251001
grok-xaigrok-3, grok-3-mini
gemini-geminigemini-2.5-pro, gemini-2.5-flash

Chat Completions

Standard chat completions work identically to the OpenAI API.

Endpoint

  • Name
    POST /api/ai/v1/chat/completions
    Type
    endpoint
    Description

    Submit a chat completion request. Supports streaming, tool calling, vision, and JSON mode.

Endpoint

POST /api/ai/v1/chat/completions
Authorization: Bearer <temps-api-key>
Content-Type: application/json

Multi-turn conversation

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to calculate fibonacci numbers."},
        {"role": "assistant", "content": "Here's a fibonacci function:\n\n```python\ndef fib(n):\n    if n <= 1:\n        return n\n    return fib(n-1) + fib(n-2)\n```"},
        {"role": "user", "content": "Can you make it iterative for better performance?"},
    ],
    temperature=0.7,
    max_tokens=500,
)

Sampling parameters

The gateway forwards the standard OpenAI sampling parameters to each provider. The two most common are temperature and top_p:

  • Name
    temperature
    Type
    number
    Description

    Controls randomness. Lower values (near 0) make output more focused and deterministic — the model almost always picks the most likely next token, which is best for factual answers, code, and structured extraction. Higher values (near 1) make output more varied and creative. A value of 0.7 is a common middle ground.

  • Name
    top_p
    Type
    number
    Description

    Nucleus sampling. Restricts token selection to the smallest set of tokens whose cumulative probability is at least top_p. For example, top_p: 0.1 only ever considers the top 10% most-likely tokens, narrowing diversity at the token level. 1.0 (the effective default) considers all tokens.


Vision & Image Attachments

Send images to vision-capable models. The gateway translates image formats automatically for each provider.

Base64 Image

Send base64 image

import base64

# Read image and encode to base64
with open("screenshot.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4o",  # or claude-sonnet-4-20250514, gemini-2.5-pro
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image? Describe it in detail."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_data}"
                    },
                },
            ],
        }
    ],
    max_tokens=1000,
)

print(response.choices[0].message.content)

Image from URL

Image URL

response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what you see in this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/photo.jpg"
                    },
                },
            ],
        }
    ],
)

Multiple Images

Multiple images

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two screenshots. What changed?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{before_image}"},
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{after_image}"},
                },
            ],
        }
    ],
)

Tool Calling (Function Calling)

Define tools using the OpenAI format. The gateway translates them to each provider's native format.

Tool calling

import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. 'London'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Step 1: Send message with tools
response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",  # works with any provider
    messages=[
        {"role": "user", "content": "What's the weather like in London?"}
    ],
    tools=tools,
)

message = response.choices[0].message

# Step 2: Check if the model wants to call a tool
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(f"Model wants to call: {tool_call.function.name}({args})")

    # Step 3: Execute the tool and send the result back
    weather_result = get_weather(args["location"])  # your function

    follow_up = client.chat.completions.create(
        model="claude-sonnet-4-20250514",
        messages=[
            {"role": "user", "content": "What's the weather like in London?"},
            message,  # assistant message with tool_calls
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(weather_result),
            },
        ],
        tools=tools,
    )

    print(follow_up.choices[0].message.content)

Tool Choice

Control whether the model should call tools:

Tool choice options

# Let the model decide (default)
tool_choice="auto"

# Force the model to call a tool
tool_choice="required"

# Force a specific tool
tool_choice={"type": "function", "function": {"name": "get_weather"}}

# Don't use tools (even if provided)
tool_choice="none"

Streaming

When you set stream: true, the response is returned as Server-Sent Events with content-type: text/event-stream. All providers produce the same OpenAI-style data: {...}\n\n chunks (each with object chat.completion.chunk), terminated by a final data: [DONE]\n\n. Non-OpenAI providers such as Anthropic have their native streaming format translated into this OpenAI SSE shape, so client code is identical across providers.

Streaming

stream = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {"role": "user", "content": "Write a haiku about deployment automation."}
    ],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

print()  # newline at end

Embeddings

Generate embeddings using the standard OpenAI embeddings endpoint.

Endpoint

  • Name
    POST /api/ai/v1/embeddings
    Type
    endpoint
    Description

    Generate vector embeddings for text input.

Endpoint

POST /api/ai/v1/embeddings
Authorization: Bearer <temps-api-key>
Content-Type: application/json

Embeddings

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog",
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

List Models

Retrieve the list of models available through the gateway, including all configured providers.

Endpoint

  • Name
    GET /api/ai/v1/models
    Type
    endpoint
    Description

    Returns an OpenAI-compatible model list response with all models available through configured provider keys.

List models

curl https://your-temps-instance.com/api/ai/v1/models \
  -H "Authorization: Bearer your-temps-api-key"

Response

{
  "object": "list",
  "data": [
    { "id": "gpt-4o", "object": "model", "owned_by": "openai" },
    { "id": "claude-sonnet-4-20250514", "object": "model", "owned_by": "anthropic" },
    { "id": "gemini-2.5-pro", "object": "model", "owned_by": "google" }
  ]
}

BYOK (Bring Your Own Key)

Override the gateway's stored provider keys on a per-request basis by passing your own API key in request headers. Useful for multi-tenant scenarios or testing a new key without saving it.

HeaderDescription
x-provider-api-keyRaw API key to use for this request instead of the stored key. Presence of this header is what triggers BYOK mode.
x-provider-base-urlOptional custom upstream base URL. Only takes effect when x-provider-api-key is also supplied — a base URL on its own is ignored.

When a BYOK API key is present, the gateway skips the stored-key lookup entirely and routes the request with the caller-supplied key (and base URL, if given).

BYOK request

curl https://your-temps-instance.com/api/ai/v1/chat/completions \
  -H "Authorization: Bearer your-temps-api-key" \
  -H "x-provider-api-key: sk-your-own-openai-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Custom base URL (BYOK with upstream override)

curl https://your-temps-instance.com/api/ai/v1/chat/completions \
  -H "Authorization: Bearer your-temps-api-key" \
  -H "x-provider-api-key: sk-your-own-openai-key" \
  -H "x-provider-base-url: https://your-proxy.example.com/v1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

SSRF protection on x-provider-base-url

When a BYOK base URL is supplied, it is validated through temps_core::url_validation::validate_external_url before any upstream request is made. The validator rejects:

  • Loopback addresses (127.0.0.0/8, ::1, localhost)
  • Link-local ranges (169.254.0.0/16, fe80::/10)
  • Private ranges (10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12)
  • The cloud metadata endpoint (169.254.169.254)
  • Non-HTTP(S) schemes such as file:// and ftp://

A rejected URL returns HTTP 400 with the body Invalid X-Provider-Base-URL: {reason} and error code invalid_provider_url.


Conversation & Request Tracking

Attach metadata to requests for grouping, filtering, and tracing in the usage dashboard. All headers are optional.

HeaderDescription
x-conversation-idGroup multiple requests into a logical conversation thread
x-tagsComma-separated tags (e.g. production,feature-x) for filtering
x-request-idYour own request identifier for correlation
traceparentW3C trace context header for distributed tracing

Python — tracking headers

from openai import OpenAI
import httpx

# Pass extra headers via the OpenAI client
client = OpenAI(
    base_url="https://your-temps-instance.com/api/ai/v1",
    api_key="your-temps-api-key",
    default_headers={
        "x-conversation-id": "conv-abc123",
        "x-tags": "production,chat-feature",
    },
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

cURL — tracking headers

curl https://your-temps-instance.com/api/ai/v1/chat/completions \
  -H "Authorization: Bearer your-temps-api-key" \
  -H "x-conversation-id: conv-abc123" \
  -H "x-tags: production,chat-feature" \
  -H "x-request-id: req-xyz789" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Provider Key Management

Manage provider API keys through the admin API. Keys are encrypted at rest using AES-256-GCM. Developers only need a Temps API key — provider keys are managed centrally by administrators.

Listing keys and testing a stored key by ID require the ai_gateway:read permission; creating, updating, deleting, and testing an inline (unsaved) key require ai_gateway:write.

List Provider Keys

  • Name
    GET /api/ai/providers
    Type
    endpoint
    Description

    Returns all configured provider keys. The api_key_masked field shows only the last 4 characters.

List keys

curl https://your-temps-instance.com/api/ai/providers \
  -H "Authorization: Bearer your-admin-api-key"

Response

[
  {
    "id": 1,
    "provider": "openai",
    "display_name": "Production OpenAI Key",
    "api_key_masked": "...xYZ9",
    "base_url": null,
    "is_active": true,
    "created_at": "2025-01-15T10:00:00Z",
    "updated_at": "2025-01-15T10:00:00Z"
  }
]

Create a Provider Key

  • Name
    POST /api/ai/providers
    Type
    endpoint
    Description

    Add a new provider API key. The provider field accepts: openai, anthropic, xai, gemini.

Request body

  • Name
    provider
    Type
    string
    Description

    Provider ID: openai, anthropic, xai, or gemini.

  • Name
    display_name
    Type
    string
    Description

    Human-readable label for the key.

  • Name
    api_key
    Type
    string
    Description

    The raw provider API key. Stored encrypted.

  • Name
    base_url
    Type
    string
    Description

    Optional custom base URL for self-hosted or proxy endpoints.

Add OpenAI key

curl -X POST https://your-temps-instance.com/api/ai/providers \
  -H "Authorization: Bearer your-admin-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "display_name": "Production OpenAI Key",
    "api_key": "sk-..."
  }'

Add Anthropic key

curl -X POST https://your-temps-instance.com/api/ai/providers \
  -H "Authorization: Bearer your-admin-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "anthropic",
    "display_name": "Production Anthropic Key",
    "api_key": "sk-ant-..."
  }'

Update a Provider Key

  • Name
    PATCH /api/ai/providers/{id}
    Type
    endpoint
    Description

    Update an existing provider key. All fields are optional; only provided fields are updated.

Request body

  • Name
    display_name
    Type
    string
    Description

    New display name.

  • Name
    api_key
    Type
    string
    Description

    Replace the stored API key.

  • Name
    base_url
    Type
    string
    Description

    Update the custom base URL.

  • Name
    is_active
    Type
    boolean
    Description

    Enable or disable the key without deleting it.

Disable a key

curl -X PATCH https://your-temps-instance.com/api/ai/providers/1 \
  -H "Authorization: Bearer your-admin-api-key" \
  -H "Content-Type: application/json" \
  -d '{"is_active": false}'

Delete a Provider Key

  • Name
    DELETE /api/ai/providers/{id}
    Type
    endpoint
    Description

    Permanently remove a provider key. Returns 204 No Content on success.

Delete a key

curl -X DELETE https://your-temps-instance.com/api/ai/providers/1 \
  -H "Authorization: Bearer your-admin-api-key"

Test a Provider Key

Validate that a key works before saving it, or test an already-saved key by ID.

Test provider key

curl -X POST https://your-temps-instance.com/api/ai/providers/test \
  -H "Authorization: Bearer your-admin-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "anthropic",
    "api_key": "sk-ant-..."
  }'

Test response

{
  "success": true,
  "provider": "anthropic",
  "latency_ms": 342,
  "error": null
}

Usage & Analytics

Query token usage, costs, and request history. All time-range parameters use ISO 8601 format and default to the last 24 hours.

Usage Summary

  • Name
    GET /api/ai/usage/summary
    Type
    endpoint
    Description

    Aggregate token counts and cost for a time range.

Query parameters

  • Name
    from
    Type
    string
    Description

    ISO 8601 start time. Defaults to 24h ago.

  • Name
    to
    Type
    string
    Description

    ISO 8601 end time. Defaults to now.

Usage summary

curl "https://your-temps-instance.com/api/ai/usage/summary?from=2025-01-01T00:00:00Z&to=2025-01-08T00:00:00Z" \
  -H "Authorization: Bearer your-admin-api-key"

Time-Series Usage

  • Name
    GET /api/ai/usage/timeseries
    Type
    endpoint
    Description

    Token usage bucketed by hour, day, or week.

Query parameters

  • Name
    from
    Type
    string
    Description
    ISO 8601 start time.
  • Name
    to
    Type
    string
    Description
    ISO 8601 end time.
  • Name
    bucket
    Type
    string
    Description

    Bucket size: hour, day, or week. Defaults to day.

Daily timeseries

curl "https://your-temps-instance.com/api/ai/usage/timeseries?bucket=day&from=2025-01-01T00:00:00Z" \
  -H "Authorization: Bearer your-admin-api-key"

Top Models

  • Name
    GET /api/ai/usage/top-models
    Type
    endpoint
    Description

    Ranked list of models by request count.

Query parameters

  • Name
    from
    Type
    string
    Description
    ISO 8601 start time.
  • Name
    to
    Type
    string
    Description
    ISO 8601 end time.
  • Name
    limit
    Type
    integer
    Description
    Max results. Defaults to 10.

Top models

curl "https://your-temps-instance.com/api/ai/usage/top-models?limit=5" \
  -H "Authorization: Bearer your-admin-api-key"

Usage by Provider

  • Name
    GET /api/ai/usage/by-provider
    Type
    endpoint
    Description

    Token counts and costs broken down per provider.

By provider

curl "https://your-temps-instance.com/api/ai/usage/by-provider" \
  -H "Authorization: Bearer your-admin-api-key"

Recent Requests

The recent-requests log is paginated and filterable. It returns a UsageLogPage object — an entries array (the page of log rows, ordered by timestamp descending) plus a total count of all rows matching the active filter. Pagination is offset-based via limit and offset.

  • Name
    GET /api/ai/usage/recent
    Type
    endpoint
    Description

    A page of individual AI requests with token counts, latency, status, and cost. Returns { entries, total }.

Query parameters

  • Name
    limit
    Type
    integer
    Description
    Page size. Defaults to 20, clamped server-side to the range 1–50 (values above 50 are reduced to 50).
  • Name
    offset
    Type
    integer
    Description
    Number of rows to skip. Defaults to 0.
  • Name
    provider
    Type
    string
    Description

    Filter by provider name (openai, anthropic, xai, gemini).

  • Name
    status
    Type
    integer
    Description
    Filter by exact HTTP status code.
  • Name
    model
    Type
    string
    Description
    Filter by model name.
  • Name
    user_id
    Type
    integer
    Description
    Filter by user ID.
  • Name
    conversation_id
    Type
    string
    Description
    Filter by conversation ID.
  • Name
    tags
    Type
    string
    Description
    Filter by tags (comma-separated, AND logic).
  • Name
    cost_gte / cost_gt / cost_lte / cost_lt
    Type
    integer
    Description
    Cost bounds, in microcents. Each operator is a separate parameter.
  • Name
    tokens_gte / tokens_gt / tokens_lte / tokens_lt
    Type
    integer
    Description
    Total-token bounds (input + output). Each operator is a separate parameter.

Recent requests (paginated)

curl "https://your-temps-instance.com/api/ai/usage/recent?limit=20&offset=0" \
  -H "Authorization: Bearer your-admin-api-key"

Filter by provider, status, and cost

# Anthropic requests that succeeded (200) costing >= 5000 microcents
curl "https://your-temps-instance.com/api/ai/usage/recent?provider=anthropic&status=200&cost_gte=5000" \
  -H "Authorization: Bearer your-admin-api-key"

UsageLogPage response

{
  "entries": [
    {
      "provider": "anthropic",
      "model": "claude-sonnet-4-20250514",
      "input_tokens": 842,
      "output_tokens": 311,
      "latency_ms": 1240,
      "estimated_cost_microcents": 7184,
      "status": 200,
      "is_streaming": true,
      "is_byok": false,
      "conversation_id": "conv-abc123",
      "tags": "production,chat-feature",
      "request_id": "req-xyz789",
      "trace_id": null
    }
  ],
  "total": 1284
}

Conversations

Group requests by conversation ID to see the full token history of a chat session.

  • Name
    GET /api/ai/usage/conversations
    Type
    endpoint
    Description

    List conversation summaries. Requires requests to have been tagged with x-conversation-id.

Query parameters

  • Name
    from
    Type
    string
    Description
    ISO 8601 start time.
  • Name
    to
    Type
    string
    Description
    ISO 8601 end time.
  • Name
    limit
    Type
    integer
    Description
    Max results. Defaults to 50, max 100.
  • Name
    user_id
    Type
    integer
    Description
    Filter by user ID.
  • Name
    tags
    Type
    string
    Description
    Filter by tags (comma-separated).
  • Name
    model
    Type
    string
    Description
    Filter by model name.

List conversations

curl "https://your-temps-instance.com/api/ai/usage/conversations?tags=production" \
  -H "Authorization: Bearer your-admin-api-key"

Conversation list response

[
  {
    "conversation_id": "conv-abc123",
    "message_count": 12,
    "total_input_tokens": 8432,
    "total_output_tokens": 3210,
    "total_tokens": 11642,
    "total_cost_microcents": 58210,
    "avg_latency_ms": 824.5,
    "models_used": ["gpt-4o", "claude-sonnet-4-20250514"],
    "first_at": "2025-01-15T09:00:00Z",
    "last_at": "2025-01-15T09:45:00Z"
  }
]
  • Name
    GET /api/ai/usage/conversations/{conversation_id}
    Type
    endpoint
    Description

    All individual requests within a specific conversation, in chronological order.

Query parameters

  • Name
    limit
    Type
    integer
    Description
    Max results. Defaults to 100.

Conversation detail

curl "https://your-temps-instance.com/api/ai/usage/conversations/conv-abc123" \
  -H "Authorization: Bearer your-admin-api-key"

Pricing

Retrieve the pricing table for all supported models. Prices are in USD per 1 million tokens.

  • Name
    GET /api/ai/pricing
    Type
    endpoint
    Description

    Returns pricing for all models known to the gateway, including input, output, cache, and batch rates where applicable.

Get pricing

curl https://your-temps-instance.com/api/ai/pricing \
  -H "Authorization: Bearer your-temps-api-key"

Pricing response

{
  "models": [
    {
      "model": "gpt-4o",
      "display_name": "GPT-4o",
      "provider": "openai",
      "input_per_million": 2.50,
      "output_per_million": 10.00,
      "cache_hit_per_million": 1.25,
      "batch_input_per_million": 1.25,
      "batch_output_per_million": 5.00,
      "deprecated": false
    },
    {
      "model": "claude-sonnet-4-20250514",
      "display_name": "Claude Sonnet 4.6",
      "provider": "anthropic",
      "input_per_million": 3.00,
      "output_per_million": 15.00,
      "cache_write_5m_per_million": 3.75,
      "cache_hit_per_million": 0.30,
      "deprecated": false
    }
  ]
}

Pricing fields

  • Name
    input_per_million
    Type
    number
    Description

    Base input token cost per 1M tokens.

  • Name
    output_per_million
    Type
    number
    Description

    Output token cost per 1M tokens.

  • Name
    cache_write_5m_per_million
    Type
    number
    Description

    5-minute prompt cache write cost per 1M tokens (Anthropic-style caching).

  • Name
    cache_write_1h_per_million
    Type
    number
    Description

    1-hour cache write cost per 1M tokens.

  • Name
    cache_hit_per_million
    Type
    number
    Description

    Cache hit / read cost per 1M tokens.

  • Name
    batch_input_per_million
    Type
    number
    Description

    Batch API input cost per 1M tokens (OpenAI batch API).

  • Name
    batch_output_per_million
    Type
    number
    Description

    Batch API output cost per 1M tokens.

Was this page helpful?