March 12, 2026 (3mo ago)
Written by Temps Team
Last updated March 12, 2026 (3mo ago)
OpenTelemetry is the industry-standard way to add distributed tracing to your application. To set it up, you instrument your app with the OTel SDK, configure an OTLP exporter endpoint, and point it at a backend that stores and visualizes traces — with Temps, that backend is already running as part of your deployment platform, requiring only two environment variables.
OpenTelemetry is backed by every major cloud vendor, supported in every language you'd actually use in production, and is the second most active CNCF project after Kubernetes. Yet most teams still don't have tracing set up — not because it's hard to understand, but because the standard setup requires running a collector, a storage backend, and a visualization layer before you see your first span.
This guide walks you through what OpenTelemetry tracing is, how to instrument a Node.js app, and how to skip the painful infrastructure part entirely.
TL;DR: OpenTelemetry tracing gives you end-to-end visibility into every request across your services. The standard setup requires 4-5 services (SDK, collector, Jaeger, storage, Grafana). You can skip all of that by pointing your OTLP exporter at a Temps instance, which includes a built-in OTLP ingest endpoint and trace viewer backed by TimescaleDB. Over 40 languages and frameworks have official OTel SDK support.
OpenTelemetry is a vendor-neutral observability framework maintained by the CNCF with contributions from over 180 companies. It gives you three signals — traces, metrics, and logs — through a single set of APIs and SDKs. Traces are the one most teams need first.
A trace represents a single operation as it flows through your system. Think of a user clicking "Place Order" — that request hits your API gateway, calls the inventory service, writes to the database, triggers a payment processor, and sends a confirmation email. Without tracing, when that request takes 4 seconds instead of 200ms, you're grepping logs across five services.
Each step in that journey is a span. Spans have a start time, duration, attributes, and a parent-child relationship. Stitch them together and you get a trace — a complete picture of what happened and how long each part took.
Here's what a trace looks like conceptually:
[Trace: order-checkout]
|
|-- [Span: API Gateway] 12ms
| |-- [Span: Auth Middleware] 2ms
| |-- [Span: POST /orders] 10ms
| |-- [Span: Inventory Check] 45ms
| |-- [Span: Payment Service] 380ms
| | |-- [Span: Stripe API] 350ms
| |-- [Span: Send Email] 22ms
| |-- [Span: DB Write] 8ms
Context propagation is the mechanism that ties these spans together. When Service A calls Service B, it passes a trace ID in the HTTP headers (usually traceparent). Service B picks it up, creates a child span, and passes it along. Every span shares the same trace ID, so your visualization tool can reconstruct the full request path.
Why does this matter? Because the alternative is correlating timestamps across log files — and that doesn't work at 2am during an outage.
Setting up OpenTelemetry from scratch means assembling multiple independent tools. Here's the minimum architecture for a self-hosted tracing pipeline:
You add the OpenTelemetry SDK to your app. For Node.js, that's 4-6 packages. For Java, it's an agent JAR. For Go, it's a handful of modules plus manual span creation. This is the easy part.
The SDK doesn't send traces directly to your storage backend. Best practice says you run an OTel Collector — a standalone service that receives, processes, and exports telemetry data. It handles batching, retry logic, and format translation. That's another Docker container to manage, configure, and keep alive.
Jaeger needs a storage backend. Your options are Elasticsearch, Cassandra, or Kafka. Each comes with its own operational complexity. Elasticsearch alone wants at least 2GB of RAM for a small cluster, and you'll need to manage indices, retention policies, and disk usage.
Alternatively, you can use Grafana Tempo, which stores traces in object storage (S3, GCS). Cheaper, but now you need to configure object storage access, retention, and compaction.
Jaeger has its own UI. Grafana has another. You'll probably end up running Grafana anyway for dashboards, which means configuring data source connections, provisioning dashboards, and managing yet another service.
Before you see a single trace, you're running:
| Service | Purpose | Memory |
|---|---|---|
| OTel Collector | Receive and export spans | ~200MB |
| Jaeger (or Tempo) | Trace backend | ~512MB-1GB |
| Elasticsearch | Storage | ~2GB+ |
| Grafana | Visualization | ~256MB |
That's roughly 3-4GB of RAM dedicated to observability infrastructure — on top of whatever your actual application uses. And none of this counts the hours you spend configuring YAML files.
Strip away the complexity and distributed tracing requires exactly three components: instrumentation, transport, and visualization. According to the OpenTelemetry documentation, the SDK handles the first two and any OTLP-compatible backend handles the third.
Instrument your app. Add the OTel SDK. Enable auto-instrumentation for your HTTP framework, database driver, and external API calls. This takes 10-20 lines of code.
Send traces somewhere. Configure an OTLP exporter with an endpoint URL. The SDK batches spans and ships them over HTTP or gRPC. One environment variable is all you really need.
See them. A UI that renders trace waterfalls, lets you search by service name or trace ID, and shows latency breakdowns. This is where most of the infrastructure bloat lives — and where the biggest shortcut is possible.
The insight most guides miss: you don't need a separate collector, storage backend, and visualization tool. If your deployment platform already has a built-in OTLP endpoint and trace viewer, components two and three collapse into a single service you're already running.
The OpenTelemetry JavaScript SDK supports auto-instrumentation for 30+ libraries including Express, Fastify, PostgreSQL, MongoDB, Redis, and gRPC. You can get useful traces without writing a single manual span.
npm install @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/resources \
@opentelemetry/semantic-conventions
Five packages. The auto-instrumentations-node meta-package bundles instrumentation for HTTP, Express, Fastify, pg, mysql, redis, mongodb, grpc, and more.
Create a tracing.ts (or tracing.js) file at the root of your project:
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: new Resource({
[ATTR_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME || 'my-app',
}),
traceExporter: new OTLPTraceExporter({
url: `${process.env.OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces`,
}),
instrumentations: [
getNodeAutoInstrumentations({
// Disable noisy fs instrumentation
'@opentelemetry/instrumentation-fs': { enabled: false },
// Capture SQL query text (be careful with PII)
'@opentelemetry/instrumentation-pg': {
enhancedDatabaseReporting: true,
},
}),
],
});
sdk.start();
// Graceful shutdown
process.on('SIGTERM', () => {
sdk.shutdown().then(() => process.exit(0));
});
The tracing file must load before any other imports. If Express or pg is imported before the SDK initializes, auto-instrumentation won't patch them.
# Node.js 18+
node --import ./tracing.js app.js
# Or with ts-node
node --import ./tracing.ts app.ts
For older Node.js versions, use the --require flag instead:
node -r ./tracing.js app.js
OTEL_SERVICE_NAME=checkout-api
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
That's the entire instrumentation. Every HTTP request your Express or Fastify server handles, every database query, every outbound HTTP call — they all become spans automatically.
Auto-instrumentation covers framework-level operations. For business logic, you'll want custom spans:
import { trace } from '@opentelemetry/api';
const tracer = trace.getTracer('checkout-service');
async function processOrder(orderId: string) {
return tracer.startActiveSpan('process-order', async (span) => {
span.setAttribute('order.id', orderId);
try {
const inventory = await checkInventory(orderId);
span.setAttribute('order.items_count', inventory.length);
const payment = await chargeCustomer(orderId);
span.setAttribute('order.amount', payment.amount);
return { success: true };
} catch (error) {
span.recordException(error as Error);
span.setStatus({ code: 2, message: (error as Error).message });
throw error;
} finally {
span.end();
}
});
}
Custom spans nest inside auto-instrumented spans automatically through context propagation. Your process-order span appears as a child of the HTTP handler span.
This is the question that trips up most teams. The SDK is easy. Choosing a backend isn't. Datadog charges $0.10 per ingested GB for traces, and New Relic charges $0.30 per GB beyond the free tier. At scale, tracing becomes your biggest observability bill.
Here are the realistic options:
Jaeger is the most popular open-source tracing backend. Originally built by Uber, now a CNCF graduated project.
Pros: Free, well-documented, good UI for trace exploration. Cons: Needs Elasticsearch or Cassandra for production. Scaling requires significant ops knowledge. You're now maintaining a distributed database for your distributed traces.
Best for teams that already run Elasticsearch for logging and have ops capacity to spare.
Tempo stores traces in object storage (S3, GCS, Azure Blob). No indexing — it searches by trace ID only, which keeps costs low.
Pros: Cheap storage, integrates with Grafana dashboards. Cons: No full-text search across spans without Grafana Cloud. Complex configuration. Requires a separate Grafana instance for visualization.
Best for teams already invested in the Grafana ecosystem.
Ship traces to a managed service. Zero infrastructure to manage.
Pros: Instant setup, powerful query engines, alerting built in. Cons: Expensive at scale. A moderately active service generating 50GB of trace data per month costs $5/month on Datadog. That adds up across microservices. And you're locked into their query language and retention policies.
If you're already running Temps for deployments, it includes a built-in OTLP ingest endpoint and trace storage backed by TimescaleDB. No additional services to deploy or configure.
Pros: Zero extra infrastructure. Traces live alongside your deployment logs, error tracking, and analytics. One dashboard for everything. Cons: Designed for teams already using Temps as their deployment platform.
The observability landscape has a gap: open-source tools require significant ops investment, and SaaS tools charge per-GB prices that punish you for instrumenting more of your code. The ideal solution is a platform that bundles tracing with something you already need — like deployments.
| Option | Cost | Ops Overhead | Storage |
|---|---|---|---|
| Jaeger + Elasticsearch | Free (+ infra) | High | Self-managed |
| Grafana Tempo | Free (+ infra) | Medium | Object storage |
| Datadog | $0.10/GB ingested | None | Managed |
| New Relic | $0.30/GB (over free tier) | None | Managed |
| Temps | Included (~$6/mo Cloud or free self-host) | None (already running) | TimescaleDB |
If you're already running Temps, adding tracing to your app takes two environment variables and zero additional services. Temps exposes OTLP-compatible ingest endpoints at /api/otel/v1/traces, /api/otel/v1/metrics, and /api/otel/v1/logs, accepts protobuf-encoded payloads, and stores spans in TimescaleDB hypertables with a default 7-day retention policy.
temps-otel crate) keeps 100% of error traces and 100% of traces exceeding the P95 latency threshold, with a configurable base sample rate (default: keep all traces). No collector-side config required.otel/opentelemetry-collector-contrib:0.96.0) injected into their Docker network automatically. The sidecar listens on gRPC port 4317 and HTTP port 4318 locally, then forwards to the Temps ingest endpoint. This means apps that already emit OTel data work without any code changes.In your Temps project settings (or your .env file):
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-temps-instance.example.com/api
OTEL_SERVICE_NAME=my-app
That's it. The same tracing.ts file from the previous section works without modification. The SDK sends spans to the Temps OTLP endpoint, and they show up in your dashboard automatically.
When your app sends spans to Temps:
POST /api/otel/v1/tracesNo collector to configure. No Jaeger to deploy. No Elasticsearch to tune. No Grafana dashboards to build.
If your application is deployed through Temps itself, the injected sidecar collector handles OTel forwarding automatically. For explicit control, add these environment variables to your project configuration:
# In your Temps project environment variables
OTEL_EXPORTER_OTLP_ENDPOINT=https://temps.yourdomain.com/api
OTEL_SERVICE_NAME=checkout-api
Temps accepts traces from any app that speaks OTLP, not just apps deployed on the platform. Running a Python service on a separate VPS? A Go microservice on Kubernetes? Point their OTLP exporter at your Temps instance:
# Python example with opentelemetry-sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(
endpoint="https://your-temps-instance.example.com/api/otel/v1/traces"
)
Once traces start flowing, the Temps dashboard gives you a unified view of every request across all instrumented services.
Click any trace and you see the full span tree rendered as a waterfall. Each bar represents a span — its width shows duration, its color indicates the service, and expanding it reveals attributes like HTTP status codes, database query text, and custom metadata.
The waterfall makes latency problems immediately visible. If your checkout endpoint takes 2 seconds, you can see that 1.8 seconds is spent waiting for the payment provider and 150ms is a slow database query. No guessing.
Find traces by:
POST /orders tracesTraces live alongside your deployment events in the same system. You can answer "did the last deploy make things slower?" with one click — something standalone tracing tools can't do easily because they have no concept of your deploy history.
The OTel SDK is available in 11 languages with official support, and the auto-instrumentation experience varies significantly by ecosystem. According to the OpenTelemetry Registry, Python has 40+ auto-instrumentation packages, while Rust requires manual span creation for most operations.
Python has the most mature auto-instrumentation story after Java. One command installs everything:
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
Then run your app with the auto-instrumentation wrapper:
opentelemetry-instrument \
--service_name my-python-app \
--traces_exporter otlp \
--exporter_otlp_endpoint https://your-temps-instance.example.com/api \
python app.py
Django, Flask, FastAPI, SQLAlchemy, psycopg2, redis, requests, httpx — all instrumented automatically. No code changes to your application.
Go requires slightly more setup because there's no runtime patching. You wrap your HTTP handlers and database connections manually:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)
// Wrap your HTTP handler
handler := otelhttp.NewHandler(mux, "server")
// Use the instrumented HTTP client
client := &http.Client{Transport: otelhttp.NewTransport(http.DefaultTransport)}
More code than Node.js or Python, but Go's explicit style means you always know exactly what's being traced.
Rust uses the tracing crate ecosystem with an OpenTelemetry bridge:
use tracing_subscriber::prelude::*;
use opentelemetry::trace::TracerProvider;
use opentelemetry_otlp::SpanExporter;
let exporter = SpanExporter::builder()
.with_tonic()
.build()?;
let provider = opentelemetry_sdk::trace::SdkTracerProvider::builder()
.with_batch_exporter(exporter)
.build();
tracing_subscriber::registry()
.with(tracing_opentelemetry::layer()
.with_tracer(provider.tracer("my-service")))
.init();
Rust has fewer auto-instrumentation libraries, but the tracing macros (#[instrument]) make manual instrumentation clean. Temps itself uses this exact pattern internally — the temps-otel crate bridges tracing spans to the OpenTelemetry pipeline.
Point your OTLP exporter directly at a backend that accepts OTLP natively. Set OTEL_EXPORTER_OTLP_ENDPOINT to your backend URL and OTEL_SERVICE_NAME to identify your service. With Temps, the endpoint is https://your-temps-instance.example.com/api and no collector is needed — the Temps server accepts OTLP at /api/otel/v1/traces. For teams already on Temps, apps get an OTel collector sidecar injected automatically, so you can also point the standard localhost:4318 endpoint at the sidecar.
The OpenTelemetry SDK adds roughly 1-5% CPU overhead and minimal memory usage for most applications, according to benchmarks published by the OTel project. The SDK batches spans in memory and exports them asynchronously, so the hot path of your request handler doesn't block on network calls. For most web applications, the overhead is undetectable in production. If you need to reduce it further, use head-based sampling to trace a percentage of requests rather than all of them.
Yes. OpenTelemetry provides official SDKs for 11 languages: Java, Python, JavaScript/Node.js, Go, .NET, Ruby, PHP, Swift, Erlang/Elixir, Rust, and C++. Auto-instrumentation maturity varies — Java and Python have the most libraries instrumented automatically, while Go and Rust require more manual setup. Any language that can make HTTP requests can also send traces via the OTLP HTTP protocol directly, even without an official SDK.
The OTel SDK itself is free and open-source. Storage costs depend entirely on your backend choice. Self-hosted Jaeger with Elasticsearch costs whatever your server costs — typically $20-50/month for a small setup on a VPS. Datadog charges $0.10 per ingested GB. New Relic offers 100GB free per month, then $0.30/GB. With Temps, trace storage is included — no additional cost beyond the instance you're already running for deployments. Temps Cloud is approximately $6/month (Hetzner cost + 30%). Self-hosting is free under the Apache 2.0 license.
Traces show the journey of a single request across services — they answer "what happened to this specific request and how long did each step take?" Metrics are aggregated numerical measurements over time — request count, error rate, latency. Logs are discrete text events emitted by your application. OpenTelemetry supports all three signals, but traces provide the most immediate debugging value for distributed systems because they connect cause and effect across service boundaries.
Not necessarily. The OTel SDK can export traces directly to any OTLP-compatible backend. The Collector is recommended for production because it handles batching, retry, and multi-destination routing — but if your backend already handles OTLP ingestion well (like Temps does), you can skip the Collector entirely and export directly from your app. Temps-deployed apps get an OTel Collector sidecar injected automatically if you prefer the standard collector architecture.
OpenTelemetry is the industry standard for distributed tracing, and it's not going anywhere. The SDK is mature, auto-instrumentation covers most frameworks, and the OTLP protocol means you're never locked into a specific backend.
The hard part was always the infrastructure: collectors, storage, visualization. That problem disappears when your deployment platform includes a built-in OTLP ingest endpoint backed by TimescaleDB.
If you're already using Temps, set OTEL_EXPORTER_OTLP_ENDPOINT=https://your-temps-instance.example.com/api and you have production tracing. If you're not, you can set up a Temps instance and get deployments, analytics, error tracking, and tracing from a single binary — free to self-host (Apache 2.0) or approximately $6/month on Temps Cloud:
curl -fsSL temps.sh/install.sh | bash