March 12, 2026 (1mo ago)
Written by Temps Team
Last updated March 12, 2026 (1mo ago)
OpenTelemetry has become the default standard for distributed tracing. It's backed by every major cloud vendor, supported in every language you'd actually use in production, and has more GitHub stars than most frameworks. Yet most teams still don't have tracing set up.
The reason isn't that tracing is hard to understand. It's that the tooling ecosystem expects you to run a collector, a storage backend, a visualization layer, and somehow glue them together before you see your first span. According to the CNCF, OpenTelemetry is the second most active CNCF project after Kubernetes — but adoption in production environments still lags because the setup cost is too high.
This guide walks you through what OpenTelemetry tracing actually is, how to instrument a Node.js app, and how to skip the painful infrastructure part entirely.
TL;DR: OpenTelemetry tracing gives you end-to-end visibility into every request across your services. The standard setup requires 4-5 services (SDK, collector, Jaeger, storage, Grafana). You can skip all of that by pointing your OTLP exporter at a Temps instance, which includes a built-in collector and trace viewer. Over 40 languages and frameworks have official OTel SDK support.
OpenTelemetry is a vendor-neutral observability framework maintained by the CNCF with contributions from over 180 companies. It gives you three signals — traces, metrics, and logs — through a single set of APIs and SDKs. Traces are the one most teams need first.
A trace represents a single operation as it flows through your system. Think of a user clicking "Place Order" — that request hits your API gateway, calls the inventory service, writes to the database, triggers a payment processor, and sends a confirmation email. Without tracing, when that request takes 4 seconds instead of 200ms, you're grepping logs across five services.
Each step in that journey is a span. Spans have a start time, duration, attributes, and a parent-child relationship. Stitch them together and you get a trace — a complete picture of what happened and how long each part took.
Here's what a trace looks like conceptually:
[Trace: order-checkout]
|
|-- [Span: API Gateway] 12ms
| |-- [Span: Auth Middleware] 2ms
| |-- [Span: POST /orders] 10ms
| |-- [Span: Inventory Check] 45ms
| |-- [Span: Payment Service] 380ms
| | |-- [Span: Stripe API] 350ms
| |-- [Span: Send Email] 22ms
| |-- [Span: DB Write] 8ms
Context propagation is the mechanism that ties these spans together. When Service A calls Service B, it passes a trace ID in the HTTP headers (usually traceparent). Service B picks it up, creates a child span, and passes it along. Every span shares the same trace ID, so your visualization tool can reconstruct the full request path.
Why does this matter? Because the alternative is correlating timestamps across log files. Anyone who's tried that at 2am during an outage knows it doesn't work.
According to Chronosphere, 54% of engineering teams cite "too many tools and data sources" as their top observability challenge. Setting up OpenTelemetry from scratch is a textbook example of that problem.
Here's the minimum architecture you'd need for a self-hosted tracing pipeline:
You add the OpenTelemetry SDK to your app. For Node.js, that's 4-6 packages. For Java, it's an agent JAR. For Go, it's a handful of modules plus manual span creation. This is the easy part.
The SDK doesn't send traces directly to your storage backend. Best practice says you run an OTel Collector — a standalone service that receives, processes, and exports telemetry data. It handles batching, retry logic, and format translation. That's another Docker container to manage, configure, and keep alive.
Jaeger needs a storage backend. Your options are Elasticsearch, Cassandra, or Kafka. Each comes with its own operational complexity. Elasticsearch alone wants at least 2GB of RAM for a small cluster, and you'll need to manage indices, retention policies, and disk usage.
Alternatively, you can use Grafana Tempo, which stores traces in object storage (S3, GCS). Cheaper, but now you need to configure object storage access, retention, and compaction.
Jaeger has its own UI. Grafana has another. You'll probably end up running Grafana anyway for dashboards, which means configuring data source connections, provisioning dashboards, and managing yet another service.
Before you see a single trace, you're running:
| Service | Purpose | Memory |
|---|---|---|
| OTel Collector | Receive and export spans | ~200MB |
| Jaeger (or Tempo) | Trace backend | ~512MB-1GB |
| Elasticsearch | Storage | ~2GB+ |
| Grafana | Visualization | ~256MB |
That's roughly 3-4GB of RAM dedicated to observability infrastructure — on top of whatever your actual application uses. And none of this counts the hours you spend configuring YAML files.
We've seen teams spend 2-3 days getting a working OTel pipeline, only to abandon it because the maintenance cost wasn't worth it for a small team. The irony: tracing is most valuable for the teams that can least afford the infrastructure overhead.
Strip away the complexity and distributed tracing requires exactly three components: instrumentation, transport, and visualization. According to the OpenTelemetry documentation, the SDK handles the first two and any OTLP-compatible backend handles the third.
Instrument your app. Add the OTel SDK. Enable auto-instrumentation for your HTTP framework, database driver, and external API calls. This takes 10-20 lines of code.
Send traces somewhere. Configure an OTLP exporter with an endpoint URL. The SDK batches spans and ships them over HTTP or gRPC. One environment variable is all you really need.
See them. A UI that renders trace waterfalls, lets you search by service name or trace ID, and shows latency breakdowns. This is where most of the infrastructure bloat lives — and where the biggest shortcut is possible.
The insight most guides miss: you don't need a separate collector, storage backend, and visualization tool. If your deployment platform already has a built-in OTLP endpoint and trace viewer, components two and three collapse into a single service you're already running.
The OpenTelemetry JavaScript SDK supports auto-instrumentation for 30+ libraries including Express, Fastify, PostgreSQL, MongoDB, Redis, and gRPC. You can get useful traces without writing a single manual span.
npm install @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/resources \
@opentelemetry/semantic-conventions
Five packages. The auto-instrumentations-node meta-package bundles instrumentation for HTTP, Express, Fastify, pg, mysql, redis, mongodb, grpc, and more.
Create a tracing.ts (or tracing.js) file at the root of your project:
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { Resource } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: new Resource({
[ATTR_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME || 'my-app',
}),
traceExporter: new OTLPTraceExporter({
url: `${process.env.OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces`,
}),
instrumentations: [
getNodeAutoInstrumentations({
// Disable noisy fs instrumentation
'@opentelemetry/instrumentation-fs': { enabled: false },
// Capture SQL query text (be careful with PII)
'@opentelemetry/instrumentation-pg': {
enhancedDatabaseReporting: true,
},
}),
],
});
sdk.start();
// Graceful shutdown
process.on('SIGTERM', () => {
sdk.shutdown().then(() => process.exit(0));
});
The tracing file must load before any other imports. If Express or pg is imported before the SDK initializes, auto-instrumentation won't patch them.
# Node.js 18+
node --import ./tracing.js app.js
# Or with ts-node
node --import ./tracing.ts app.ts
For older Node.js versions, use the --require flag instead:
node -r ./tracing.js app.js
OTEL_SERVICE_NAME=checkout-api
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
That's the entire instrumentation. Every HTTP request your Express or Fastify server handles, every database query, every outbound HTTP call — they all become spans automatically.
Auto-instrumentation covers framework-level operations. For business logic, you'll want custom spans:
import { trace } from '@opentelemetry/api';
const tracer = trace.getTracer('checkout-service');
async function processOrder(orderId: string) {
return tracer.startActiveSpan('process-order', async (span) => {
span.setAttribute('order.id', orderId);
try {
const inventory = await checkInventory(orderId);
span.setAttribute('order.items_count', inventory.length);
const payment = await chargeCustomer(orderId);
span.setAttribute('order.amount', payment.amount);
return { success: true };
} catch (error) {
span.recordException(error as Error);
span.setStatus({ code: 2, message: (error as Error).message });
throw error;
} finally {
span.end();
}
});
}
Custom spans nest inside auto-instrumented spans automatically through context propagation. Your process-order span appears as a child of the HTTP handler span.
This is the question that trips up most teams. The SDK is easy. Choosing a backend isn't. Datadog charges $0.10 per ingested GB for traces, and New Relic charges $0.30 per GB beyond the free tier. At scale, tracing becomes your biggest observability bill.
Here are the realistic options:
Jaeger is the most popular open-source tracing backend. Originally built by Uber, now a CNCF graduated project.
Pros: Free, well-documented, good UI for trace exploration. Cons: Needs Elasticsearch or Cassandra for production. Scaling requires significant ops knowledge. You're now maintaining a distributed database for your distributed traces.
Best for teams that already run Elasticsearch for logging and have ops capacity to spare.
Tempo stores traces in object storage (S3, GCS, Azure Blob). No indexing — it searches by trace ID only, which keeps costs low.
Pros: Cheap storage, integrates with Grafana dashboards. Cons: No full-text search across spans without Grafana Cloud. Complex configuration. Requires a separate Grafana instance for visualization.
Best for teams already invested in the Grafana ecosystem.
Ship traces to a managed service. Zero infrastructure to manage.
Pros: Instant setup, powerful query engines, alerting built in. Cons: Expensive. A moderately active service generating 50GB of trace data per month costs $5-15/month on Datadog. That adds up across microservices. And you're locked into their query language and retention policies.
If you're already running Temps for deployments, it includes a built-in OpenTelemetry collector and trace storage backed by TimescaleDB. No additional services to deploy or configure.
Pros: Zero extra infrastructure. Traces live alongside your deployment logs, error tracking, and analytics. One dashboard for everything. Cons: Designed for teams already using Temps as their deployment platform.
The observability landscape has a gap: open-source tools require significant ops investment, and SaaS tools charge per-GB prices that punish you for instrumenting more of your code. The ideal solution is a platform that bundles tracing with something you already need — like deployments.
| Option | Cost | Ops Overhead | Storage |
|---|---|---|---|
| Jaeger + Elasticsearch | Free (+ infra) | High | Self-managed |
| Grafana Tempo | Free (+ infra) | Medium | Object storage |
| Datadog | $0.10/GB ingested | None | Managed |
| New Relic | $0.30/GB (over free tier) | None | Managed |
| Temps | Included | None (already running) | TimescaleDB |
If you're already running Temps, adding tracing to your app takes two environment variables and zero additional services. Temps includes a built-in OTLP-compatible endpoint that accepts traces over HTTP, stores them in TimescaleDB hypertables, and renders them in the dashboard.
In your Temps project settings (or your .env file):
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-temps-instance.example.com
OTEL_SERVICE_NAME=my-app
That's it. The same tracing.ts file from the previous section works without modification. The SDK sends spans to the Temps OTLP endpoint, and they show up in your dashboard automatically.
When your app sends spans to Temps:
No collector to configure. No Jaeger to deploy. No Elasticsearch to tune. No Grafana dashboards to build.
If your application is deployed through Temps itself, tracing is even simpler. Add the two environment variables to your project's environment configuration in the dashboard, and the next deployment picks them up. The OTLP endpoint is your Temps instance URL — the same one that serves your app.
# In your Temps project environment variables
OTEL_EXPORTER_OTLP_ENDPOINT=https://temps.yourdomain.com
OTEL_SERVICE_NAME=checkout-api
Temps accepts traces from any app that speaks OTLP, not just apps deployed on the platform. Running a Python service on a separate VPS? A Go microservice on Kubernetes? Point their OTLP exporter at your Temps instance and the traces flow in alongside everything else.
# Python example with opentelemetry-sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(
endpoint="https://your-temps-instance.example.com/v1/traces"
)
In internal testing, the overhead of routing traces through the built-in Temps collector added less than 3ms of latency per batch export compared to sending directly to Jaeger. The TimescaleDB storage layer handles retention automatically through continuous aggregation policies.
Once traces start flowing, the Temps dashboard gives you a unified view of every request across all instrumented services. According to Google DORA, the average time to identify a performance regression drops from hours to minutes when teams have trace visualization available.
Click any trace and you see the full span tree rendered as a waterfall. Each bar represents a span — its width shows duration, its color indicates the service, and expanding it reveals attributes like HTTP status codes, database query text, and custom metadata.
The waterfall makes latency problems immediately visible. If your checkout endpoint takes 2 seconds, you can see that 1.8 seconds is spent waiting for the payment provider and 150ms is a slow database query. No guessing.
Find traces by:
POST /orders tracesThe dashboard shows P50, P90, P95, and P99 latency for each service and operation. You can spot when a deployment caused a latency regression and correlate it with the specific deployment event — because Temps knows when you deployed.
This is something standalone tracing tools can't do easily. Your deployment history and your trace data live in the same system, so you can answer "did the last deploy make things slower?" with one click.
Temps automatically builds a service dependency map from your trace data. If checkout-api calls inventory-service which calls postgres, you see that graph with request counts and error rates on each edge. No manual configuration required — it's derived entirely from span parent-child relationships.
The OTel SDK is available in 11 languages with official support, and the auto-instrumentation experience varies significantly by ecosystem. According to the OpenTelemetry Registry, Python has 40+ auto-instrumentation packages, while Rust requires manual span creation for most operations.
Python has the most mature auto-instrumentation story after Java. One command installs everything:
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
Then run your app with the auto-instrumentation wrapper:
opentelemetry-instrument \
--service_name my-python-app \
--traces_exporter otlp \
--exporter_otlp_endpoint https://your-temps-instance.example.com \
python app.py
Django, Flask, FastAPI, SQLAlchemy, psycopg2, redis, requests, httpx — all instrumented automatically. No code changes to your application.
Go requires slightly more setup because there's no runtime patching. You wrap your HTTP handlers and database connections manually:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)
// Wrap your HTTP handler
handler := otelhttp.NewHandler(mux, "server")
// Use the instrumented HTTP client
client := &http.Client{Transport: otelhttp.NewTransport(http.DefaultTransport)}
More code than Node.js or Python, but Go's explicit style means you always know exactly what's being traced.
Rust uses the tracing crate ecosystem with an OpenTelemetry bridge:
use tracing_subscriber::prelude::*;
use opentelemetry::trace::TracerProvider;
use opentelemetry_otlp::SpanExporter;
let exporter = SpanExporter::builder()
.with_tonic()
.build()?;
let provider = opentelemetry_sdk::trace::SdkTracerProvider::builder()
.with_batch_exporter(exporter)
.build();
tracing_subscriber::registry()
.with(tracing_opentelemetry::layer()
.with_tracer(provider.tracer("my-service")))
.init();
Rust has fewer auto-instrumentation libraries, but the tracing macros (#[instrument]) make manual instrumentation clean. Temps itself uses this exact pattern internally.
The OpenTelemetry SDK adds roughly 1-5% CPU overhead and minimal memory usage for most applications, according to benchmarks published by the OTel project. The SDK batches spans in memory and exports them asynchronously, so the hot path of your request handler doesn't block on network calls. For most web applications, the overhead is undetectable in production. If you need to reduce it further, use head-based sampling to trace a percentage of requests rather than all of them.
Yes. OpenTelemetry provides official SDKs for 11 languages: Java, Python, JavaScript/Node.js, Go, .NET, Ruby, PHP, Swift, Erlang/Elixir, Rust, and C++. Auto-instrumentation maturity varies — Java and Python have the most libraries instrumented automatically, while Go and Rust require more manual setup. Any language that can make HTTP requests can also send traces via the OTLP HTTP protocol directly, even without an official SDK.
The OTel SDK itself is free and open-source. Storage costs depend entirely on your backend choice. Self-hosted Jaeger with Elasticsearch costs whatever your server costs — typically $20-50/month for a small setup on a VPS. Datadog charges $0.10 per ingested GB. New Relic offers 100GB free per month, then $0.30/GB. With Temps, trace storage is included — no additional cost beyond the instance you're already running for deployments.
Traces show the journey of a single request across services — they answer "what happened to this specific request and how long did each step take?" Metrics are aggregated numerical measurements over time — request count, error rate, P99 latency. Logs are discrete text events emitted by your application. OpenTelemetry supports all three signals, but traces provide the most immediate debugging value for distributed systems because they connect cause and effect across service boundaries.
Not necessarily. The OTel SDK can export traces directly to any OTLP-compatible backend. The Collector is recommended for production because it handles batching, retry, and multi-destination routing — but if your backend already handles OTLP ingestion well (like Temps does), you can skip the Collector entirely and export directly from your app. This reduces operational complexity and is perfectly fine for most teams.
OpenTelemetry is the industry standard for distributed tracing, and it's not going anywhere. The SDK is mature, auto-instrumentation covers most frameworks, and the OTLP protocol means you're never locked into a specific backend.
The hard part was always the infrastructure: collectors, storage, visualization. That problem disappears when your deployment platform includes a built-in OTLP endpoint.
If you're already using Temps, add two environment variables and you have production tracing. If you're not, you can set up a Temps instance and get deployments, analytics, error tracking, and tracing from a single binary:
curl -fsSL temps.sh/install.sh | bash