Why OpenTelemetry
Before OTel, every vendor had its own SDK:
- Prometheus client for metrics
- Jaeger / Zipkin client for traces
- Datadog APM agent for everything, but with vendor lock-in
- structured logger for logs
Four SDKs, four pipelines, four formats. Changing the backend meant rewriting all the instrumentation in your code.
OpenTelemetry (CNCF, 2019, a merger of OpenCensus and OpenTracing) gives you:
- One SDK for all three signals (metrics, traces, logs)
- One protocol, OTLP (OpenTelemetry Line Protocol)
- One Collector for transformation and routing
- Vendor-neutral code: it does not know where data goes, whether to Prom, Datadog, or cloud monitoring. You switch backends through config.
In 2025 OTel is the de facto standard for new projects. The old Prometheus client still works (the OTel Collector can receive it), but new code is written with OTel.
The three signals
Traces (tracing-basics) are the request flow through services.
Spans carry parent-child links, and context propagation goes through
the [[http2-internals|traceparent header]].
Metrics are counters, gauges, and histograms. The semantics match [[metric-types|Prometheus]], but they go through the OTel SDK.
Logs are structured events with automatic correlation: trace_id and span_id are baked into the log record.
All three travel through one SDK over a single OTLP channel. That cuts coupling and keeps the data coherent.
Architecture
┌──────────────┐ OTLP gRPC :4317 ┌─────────────┐
│ App │ ──────────────────────► │ Collector │
│ ┌──────────┐ │ OTLP HTTP :4318 │ │
│ │ OTel SDK │ │ │ ┌─────────┐ │
│ │ ┌──────┐ │ │ │ │receivers│ │
│ │ │tracer│ │ │ │ ├─────────┤ │
│ │ │meter │ │ │ │ │processor│ │
│ │ │logger│ │ │ │ ├─────────┤ │
│ │ └──────┘ │ │ │ │exporters│ │
│ └──────────┘ │ │ └────┬────┘ │
└──────────────┘ └──────┼──────┘
│
┌─────────┬───────┼────────┬────────┐
▼ ▼ ▼ ▼ ▼
Prometheus Tempo Loki Jaeger Datadog
OTLP, the protocol
OTLP is a single wire format. It has two transports:
| Transport | Port (default) | When |
|---|---|---|
| gRPC | 4317 | server-to-server, internal, low-latency |
| HTTP/protobuf | 4318 | through a proxy, browser, restrictive networks |
The payload is Protocol Buffers. Structure:
ResourceSpans
├── Resource (service.name, host.name, k8s.pod.name)
└── ScopeSpans
├── InstrumentationScope (library name + version)
└── Span[]
├── trace_id, span_id, parent_span_id
├── name, start_time_nano, end_time_nano
├── attributes (key-value)
├── events[]
├── links[]
└── status (OK / ERROR)
Metrics work the same way (ResourceMetrics → ScopeMetrics → Metric)
and so do logs (ResourceLogs → ScopeLogs → LogRecord).
Advantages over the Prom format:
- Binary, 3-5x more compact
- Streaming through [[grpc-basics|gRPC]] (no HTTP poll)
- One format for all three signals
SDK: auto vs manual instrumentation
With auto-instrumentation, an agent patches libraries at runtime, with no code changes:
- Java:
-javaagent:opentelemetry-javaagent.jarpatches JDBC, Servlet, the Kafka client, gRPC, around 120 libraries - Python:
opentelemetry-instrument python app.pypatches requests, Flask, Django, psycopg2, redis-py - Node.js:
--require @opentelemetry/auto-instrumentations-node - Go: no reflection, so you add it by hand (an eBPF-based agent is in progress)
- .NET: the
OTEL_DOTNET_AUTO_HOMEenv var
You get traces for HTTP/DB/Kafka without a single line of code. From there you can add manual spans for business logic.
Manual instrumentation uses an explicit API:
from opentelemetry import trace, metrics
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
request_counter = meter.create_counter("requests")duration_histogram = meter.create_histogram("request_duration_ms")@app.get("/checkout")def checkout():
with tracer.start_as_current_span("checkout") as span: span.set_attribute("user.id", user_id) request_counter.add(1, {"endpoint": "/checkout"})# ... business logic
OTel Collector
A standalone service. Deploy it on every node (DaemonSet) or as a per-cluster gateway.
The config has three sections:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: app
static_configs:
- targets: [app:8080]
processors:
batch: # batches before exporting
send_batch_size: 8192
timeout: 200ms
memory_limiter: # backpressure
limit_mib: 512
tail_sampling: # sample by trace condition
policies:
- name: errors
type: status_code
status_code: {status_codes: [ERROR]}- name: slow
type: latency
latency: {threshold_ms: 1000}- name: probabilistic-1pct
type: probabilistic
probabilistic: {sampling_percentage: 1}exporters:
prometheusremotewrite:
endpoint: http://victoriametrics:8480/api/v1/write
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
A pipeline is an acyclic graph. One collector can serve all three signals independently.
Resource: describing the source
Resource holds process and host attributes shared by every signal:
service.name=checkout
service.version=1.4.2
service.instance.id=checkout-7f8b9c
k8s.namespace.name=prod
k8s.pod.name=checkout-7f8b9c-q2lx9
host.name=node-12.us-east
cloud.provider=aws
cloud.region=us-east-1
Set it through env vars:
OTEL_SERVICE_NAME=checkout
OTEL_RESOURCE_ATTRIBUTES=service.version=1.4.2,deployment.environment=prod
In k8s it is auto-injected through the [[opentelemetry-operator-k8s|OTel Operator]] (a sidecar/auto-instrumentation CRD).
OTel vs Prometheus client
| Aspect | Prom client | OTel SDK |
|---|---|---|
| Signals | metrics only | metrics+traces+logs |
| Transport | HTTP pull (/metrics) | OTLP push |
| Vendor neutrality | Prom-only | any backend |
| Auto-instrumentation | minimal | full |
| Adoption | widest | growing fast |
| Wire format | text/OpenMetrics | protobuf |
You can combine them: the OTel SDK for traces and logs plus the Prom client for metrics. Or the OTel SDK for everything, with the Collector exporting metrics in Prom format.
Sampling: head vs tail
Keeping 100% of traces is not feasible: 10K req/s × 5 spans × 5KB is about 250 MB/s. You need sampling.
- Head-based: the sample-or-drop decision happens at the start of the trace (at the edge), and every downstream span honors it. Simple and predictable. The downside is that it drops error traces at random.
- Tail-based: collect the whole trace in the Collector, then decide keep or drop from trace properties (status, latency, attributes). You see every error, but the Collector holds all spans in memory for 5 to 30s.
Tail-based is preferable. Use the Tail Sampling Processor in the Collector.
When things go wrong
OTLP/gRPC connection refused: the Collector is not running, or it is on a different port. The default is 4317. Check the firewall.- A trace is missing even though there was an error: head sampling
at 1% dropped it. Use tail sampling with a
status_code: ERRORpolicy. - Collector OOM: the
memory_limiterprocessor is missing, or the limit is above RAM. Add a limit below 80% of the container memory. - Cardinality explosion: metric
attributescarry a user-id or request-id. See cardinality-explosion. - Auto-instrumentation broke the app: usually the Java agent and a
bytecode-patch conflict. Bump the OTel agent version, or disable a
specific instrumentation:
OTEL_INSTRUMENTATION_<name>_ENABLED=false. - Span attributes go missing: the
batchprocessor is missing, or attributes were added afterend(). Set them before.end(). - Service.name = "unknown_service" in Tempo: the
OTEL_SERVICE_NAMEenv var is missing. The Resource is not configured.
OTel vs Datadog/New Relic
Vendor APMs (Datadog, New Relic, Splunk) give you an all-in-one with a UI and ML features. But there is vendor lock-in, and replacing one means rewriting.
With OTel you write instrumentation once and send it to Datadog through a native receiver. A year later you point the exporter at Tempo/Loki/Mimir without touching the code.
Cost-wise: OTel plus self-hosted (Tempo, Loki, VictoriaMetrics) is 5 to 10x cheaper than Datadog at 100 GB/day or more, but it takes ops investment.