Service discovery in Prometheus: k8s, Consul, file_sd, relabel

Why service discovery

A static config works for 5 hosts. For 500 it does not. In Kubernetes the endpoints change every second (rollouts, autoscaling). You need a way to learn who to scrape automatically.

The answer is service discovery (SD): Prometheus says "give me all pods/services with these labels", the SD mechanism returns a list of endpoints, and Prom scrapes them.

Around 30 SD mechanisms are supported: kubernetes, consul, dns, ec2, azure, gce, file_sd, http_sd. The most common are k8s and consul.

Discovery → relabel → scrape

┌──────────────┐

│ SD mechanism │  returns targets with meta-labels

│ (k8s, etc)   │  __meta_kubernetes_pod_name, etc

└──────┬───────┘

       │ raw targets with __meta_* labels

▼

┌──────────────┐

│  relabel_    │  filter + transform labels

│  configs     │  action: keep/drop/replace/labelmap

└──────┬───────┘

       │ final targets

▼

┌──────────────┐

│  scrape      │  HTTP GET /metrics

└──────┬───────┘

       │ raw metrics

▼

┌──────────────┐

│  metric_     │  drop bad metrics, rewrite names

│  relabel_    │

│  configs     │

└──────┬───────┘

│

▼

     TSDB

Critical insight: __meta_* labels are dropped after relabel. If you want them in the TSDB, use an explicit replace action.

Kubernetes SD

yaml

scrape_configs:

  - job_name: kubernetes-pods

    kubernetes_sd_configs:

      - role: pod              # pod | service | endpoints | endpointslices | node | ingress

    relabel_configs:

      # Only pods with the annotation prometheus.io/scrape=true

      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

        action: keep

        regex: 'true'

      # Take the port from the annotation prometheus.io/port

      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

        action: replace

        regex: '([^:]+)(?::\d+)?;(\d+)'

        replacement: '$1:$2'

        target_label: __address__

      # Path from the annotation prometheus.io/path (default /metrics)

      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

        action: replace

        target_label: __metrics_path__

        regex: '(.+)'

      # All pod labels → metric labels with a prefix

      - action: labelmap

        regex: __meta_kubernetes_pod_label_(.+)

      # Convenience labels

      - source_labels: [__meta_kubernetes_namespace]

        target_label: namespace

      - source_labels: [__meta_kubernetes_pod_name]

        target_label: pod

      - source_labels: [__meta_kubernetes_pod_node_name]

        target_label: node

Result: every pod with the prometheus.io/scrape=true annotation is scraped. All its k8s labels are copied into metric labels.

Roles in kubernetes_sd

Role	What it returns	When
`node`	Kubernetes nodes (kubelet)	host metrics, kubelet
`pod`	every pod	application metrics
`service`	k8s Service objects	blackbox probes to services
`endpoints`	endpoints (legacy)	a replacement for `service` for kube-state-metrics
`endpointslices`	EndpointSlice (modern)	k8s 1.21+, scale better
`ingress`	Ingress objects	check ingresses

Modern setup: endpointslices instead of endpoints (better performance on large clusters).

Consul SD

yaml

scrape_configs:

  - job_name: consul

    consul_sd_configs:

      - server: consul.example.com:8500

        tags: ['prometheus']      # only services with the tag

    relabel_configs:

      - source_labels: [__meta_consul_service]

        target_label: service

      - source_labels: [__meta_consul_tags]

        target_label: tags

Consul is popular in non-k8s stacks (Nomad, classic VMs). A service registers itself in Consul, and Prom learns about it through SD.

file_sd: static with granularity

When there is no k8s or Consul, but you have a script that knows who to scrape:

yaml

scrape_configs:

  - job_name: file-discovery

    file_sd_configs:

      - files: ['/etc/prometheus/targets/*.json']

        refresh_interval: 30s

The file:

json

    "targets": ["host1:9100", "host2:9100"],

    "labels": {"env": "prod", "team": "infra"}

},

    "targets": ["dbhost:9187"],

    "labels": {"env": "prod", "team": "db"}

An external tool (Ansible, terraform, chef) generates the JSON. Prom auto-reloads every 30s. Flexible and simple.

relabel actions

Action	What it does
`replace`	writes `regex.replace(source, replacement)` into `target_label`
`keep`	drop the target if `source ~ regex` does NOT match
`drop`	drop the target if `source ~ regex` matches
`keepequal`	keep if source == target
`dropequal`	drop if source == target
`hashmod`	`target_label = hash(source) % modulus` (for sharding)
`labelmap`	copies all labels matching the regex (with renaming)
`labeldrop`	removes labels matching the regex
`labelkeep`	keeps only labels matching the regex
`lowercase` / `uppercase`	case transform

keep and drop are the most common for filtering. replace and labelmap are for shaping labels.

Sharding with hashmod

Three Proms scrape 1000 targets, split evenly:

yaml

relabel_configs:

  - source_labels: [__address__]

    modulus: 3

    target_label: __tmp_hash

    action: hashmod

  - source_labels: [__tmp_hash]

    regex: '0'              # this Prom, shard 0

    action: keep

Each Prom holds about 330 targets. Federation aggregates upward.

metric_relabel_configs: after scrape

Applies to already scraped metrics, before they are written to the TSDB.

yaml

scrape_configs:

  - job_name: ...

    metric_relabel_configs:

      # Drop high-cardinality metrics

      - source_labels: [__name__]

        regex: 'go_gc_pauses_seconds_bucket'

        action: drop

      # Drop a specific label with user_id (cardinality)

      - regex: 'user_id'

        action: labeldrop

      # Rewrite metric name

      - source_labels: [__name__]

        regex: 'old_metric_name'

        replacement: 'new_metric_name'

        target_label: __name__

Used to fight cardinality-explosion from ill-behaved exporters. Better to fix it in the code, but sometimes you have no access.

Best practices

Filter at the SD stage, not the metric stage: keep/drop is cheaper than metric_relabel, and it puts less load on the target.
Convenient labels (namespace, pod, service): stable names across all jobs. Do not use __meta_kubernetes_* in queries.
Do not copy every pod label with labelmap blindly. k8s attaches controller-revision-hash, pod-template-hash, and so on. That is cardinality. Whitelist with a regex in labelmap:
yaml
```
- action: labelmap
```
```
  regex: __meta_kubernetes_pod_label_(app|version|component)
```
CI-test your relabel: promtool check config plus targeted dry-runs through promtool (limited).

kube-state-metrics + node-exporter

The standard k8s monitoring stack:

node-exporter on every node → node_* metrics
kube-state-metrics, a single instance → kube_* metrics about the state of k8s objects
cAdvisor in the kubelet → container metrics
app metrics through annotation discovery

All through k8s SD with different relabel configs.

When things go wrong

No targets in /targets: a relabel keep is too strict and nothing is left. Remove one rule at a time and check the UI.
Targets exist, but scrape errors with "401 Unauthorized": a kubelet scrape needs a ServiceAccount and RBAC, or bearer_token_file: /var/run/secrets/.../token.
Cardinality explosion after a rollout: labelmap copied pod-template-hash. Whitelist the labels.
Targets are duplicated: the same endpoint appears in several roles. Deduplicate: one role plus the right selector.
Slow SD reload (5+ minutes): a k8s API rate limit. Lower refresh_interval or use endpointslices instead of endpoints.
__address__ has the wrong port: k8s SD takes the first declared port. Override it from an annotation with replace.
Stale targets after a k8s namespace delete: Prom keeps them until --query.lookback-delta (default 5m). This is normal.

Why service discovery

A static config works for 5 hosts. For 500 it does not. In Kubernetes the endpoints change every second (rollouts, autoscaling). You need a way to learn who to scrape automatically.

The answer is service discovery (SD): Prometheus says "give me all pods/services with these labels", the SD mechanism returns a list of endpoints, and Prom scrapes them.

Around 30 SD mechanisms are supported: kubernetes, consul, dns, ec2, azure, gce, file_sd, http_sd. The most common are k8s and consul.

Discovery → relabel → scrape

┌──────────────┐

│ SD mechanism │  returns targets with meta-labels

│ (k8s, etc)   │  __meta_kubernetes_pod_name, etc

└──────┬───────┘

       │ raw targets with __meta_* labels

▼

┌──────────────┐

│  relabel_    │  filter + transform labels

│  configs     │  action: keep/drop/replace/labelmap

└──────┬───────┘

       │ final targets

▼

┌──────────────┐

│  scrape      │  HTTP GET /metrics

└──────┬───────┘

       │ raw metrics

▼

┌──────────────┐

│  metric_     │  drop bad metrics, rewrite names

│  relabel_    │

│  configs     │

└──────┬───────┘

│

▼

     TSDB

Critical insight: __meta_* labels are dropped after relabel. If you want them in the TSDB, use an explicit replace action.

Kubernetes SD

yaml

scrape_configs:

  - job_name: kubernetes-pods

    kubernetes_sd_configs:

      - role: pod              # pod | service | endpoints | endpointslices | node | ingress

    relabel_configs:

      # Only pods with the annotation prometheus.io/scrape=true

      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

        action: keep

        regex: 'true'

      # Take the port from the annotation prometheus.io/port

      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

        action: replace

        regex: '([^:]+)(?::\d+)?;(\d+)'

        replacement: '$1:$2'

        target_label: __address__

      # Path from the annotation prometheus.io/path (default /metrics)

      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

        action: replace

        target_label: __metrics_path__

        regex: '(.+)'

      # All pod labels → metric labels with a prefix

      - action: labelmap

        regex: __meta_kubernetes_pod_label_(.+)

      # Convenience labels

      - source_labels: [__meta_kubernetes_namespace]

        target_label: namespace

      - source_labels: [__meta_kubernetes_pod_name]

        target_label: pod

      - source_labels: [__meta_kubernetes_pod_node_name]

        target_label: node

Result: every pod with the prometheus.io/scrape=true annotation is scraped. All its k8s labels are copied into metric labels.

Roles in kubernetes_sd

Role	What it returns	When
`node`	Kubernetes nodes (kubelet)	host metrics, kubelet
`pod`	every pod	application metrics
`service`	k8s Service objects	blackbox probes to services
`endpoints`	endpoints (legacy)	a replacement for `service` for kube-state-metrics
`endpointslices`	EndpointSlice (modern)	k8s 1.21+, scale better
`ingress`	Ingress objects	check ingresses

Modern setup: endpointslices instead of endpoints (better performance on large clusters).

Consul SD

yaml

scrape_configs:

  - job_name: consul

    consul_sd_configs:

      - server: consul.example.com:8500

        tags: ['prometheus']      # only services with the tag

    relabel_configs:

      - source_labels: [__meta_consul_service]

        target_label: service

      - source_labels: [__meta_consul_tags]

        target_label: tags

Consul is popular in non-k8s stacks (Nomad, classic VMs). A service registers itself in Consul, and Prom learns about it through SD.

file_sd: static with granularity

When there is no k8s or Consul, but you have a script that knows who to scrape:

yaml

scrape_configs:

  - job_name: file-discovery

    file_sd_configs:

      - files: ['/etc/prometheus/targets/*.json']

        refresh_interval: 30s

The file:

json

    "targets": ["host1:9100", "host2:9100"],

    "labels": {"env": "prod", "team": "infra"}

},

    "targets": ["dbhost:9187"],

    "labels": {"env": "prod", "team": "db"}

An external tool (Ansible, terraform, chef) generates the JSON. Prom auto-reloads every 30s. Flexible and simple.

relabel actions

Action	What it does
`replace`	writes `regex.replace(source, replacement)` into `target_label`
`keep`	drop the target if `source ~ regex` does NOT match
`drop`	drop the target if `source ~ regex` matches
`keepequal`	keep if source == target
`dropequal`	drop if source == target
`hashmod`	`target_label = hash(source) % modulus` (for sharding)
`labelmap`	copies all labels matching the regex (with renaming)
`labeldrop`	removes labels matching the regex
`labelkeep`	keeps only labels matching the regex
`lowercase` / `uppercase`	case transform

keep and drop are the most common for filtering. replace and labelmap are for shaping labels.

Sharding with hashmod

Three Proms scrape 1000 targets, split evenly:

yaml

relabel_configs:

  - source_labels: [__address__]

    modulus: 3

    target_label: __tmp_hash

    action: hashmod

  - source_labels: [__tmp_hash]

    regex: '0'              # this Prom, shard 0

    action: keep

Each Prom holds about 330 targets. Federation aggregates upward.

metric_relabel_configs: after scrape

Applies to already scraped metrics, before they are written to the TSDB.

yaml

scrape_configs:

  - job_name: ...

    metric_relabel_configs:

      # Drop high-cardinality metrics

      - source_labels: [__name__]

        regex: 'go_gc_pauses_seconds_bucket'

        action: drop

      # Drop a specific label with user_id (cardinality)

      - regex: 'user_id'

        action: labeldrop

      # Rewrite metric name

      - source_labels: [__name__]

        regex: 'old_metric_name'

        replacement: 'new_metric_name'

        target_label: __name__

Used to fight cardinality-explosion from ill-behaved exporters. Better to fix it in the code, but sometimes you have no access.

Best practices

Filter at the SD stage, not the metric stage: keep/drop is cheaper than metric_relabel, and it puts less load on the target.
Convenient labels (namespace, pod, service): stable names across all jobs. Do not use __meta_kubernetes_* in queries.
Do not copy every pod label with labelmap blindly. k8s attaches controller-revision-hash, pod-template-hash, and so on. That is cardinality. Whitelist with a regex in labelmap:
yaml
```
- action: labelmap
```
```
  regex: __meta_kubernetes_pod_label_(app|version|component)
```
CI-test your relabel: promtool check config plus targeted dry-runs through promtool (limited).

kube-state-metrics + node-exporter

The standard k8s monitoring stack:

node-exporter on every node → node_* metrics
kube-state-metrics, a single instance → kube_* metrics about the state of k8s objects
cAdvisor in the kubelet → container metrics
app metrics through annotation discovery

All through k8s SD with different relabel configs.

When things go wrong

No targets in /targets: a relabel keep is too strict and nothing is left. Remove one rule at a time and check the UI.
Targets exist, but scrape errors with "401 Unauthorized": a kubelet scrape needs a ServiceAccount and RBAC, or bearer_token_file: /var/run/secrets/.../token.
Cardinality explosion after a rollout: labelmap copied pod-template-hash. Whitelist the labels.
Targets are duplicated: the same endpoint appears in several roles. Deduplicate: one role plus the right selector.
Slow SD reload (5+ minutes): a k8s API rate limit. Lower refresh_interval or use endpointslices instead of endpoints.
__address__ has the wrong port: k8s SD takes the first declared port. Override it from an annotation with replace.
Stale targets after a k8s namespace delete: Prom keeps them until --query.lookback-delta (default 5m). This is normal.

Service discovery in Prometheus: k8s, Consul, file_sd, relabel

Why service discovery

Discovery → relabel → scrape

Kubernetes SD

Roles in kubernetes_sd

Consul SD

file_sd: static with granularity

relabel actions

Sharding with hashmod

metric_relabel_configs: after scrape

Best practices

kube-state-metrics + node-exporter

When things go wrong

§ команды

§ см. также

Service discovery in Prometheus: k8s, Consul, file_sd, relabel

Why service discovery

Discovery → relabel → scrape

Kubernetes SD

Roles in kubernetes_sd

Consul SD

file_sd: static with granularity

relabel actions

Sharding with hashmod

metric_relabel_configs: after scrape

Best practices

kube-state-metrics + node-exporter

When things go wrong

§ команды

§ см. также