jq: JSON queries and transformation: jq

Why jq

HTTP APIs increasingly return JSON, and core tools like kubectl, docker, gh, and aws all support JSON output. Parsing JSON with regular expressions is painful and fragile. jq is the standard for CLI JSON processing: a single binary with no dependencies, and a filter language similar to XPath/JSONPath but more expressive.

The alternative, python -c 'import json,sys;...', is harder to read in a pipe.

Basic syntax

jq [OPTIONS] FILTER [FILE...]

Without a file, jq reads stdin. The simplest filter is . (identity):

bash

curl -s api.example.com/users | jq .

This gives you colored pretty-print output. Often that is enough.

Selectors

bash

jq '.name'                  # field name

jq '.users[0]'              # first element of the array

jq '.users[]'               # ALL elements as a stream (not an array)

jq '.users[].email'         # email field of each element

jq '.users | length'        # length of the array

jq '.users | keys'          # keys of an object (or indices of an array)

jq '.["weird-key"]'         # keys with hyphens/spaces via []

jq '..|.email? // empty'    # recursive search for email anywhere in the tree

select(): filtering

bash

# Active users

jq '.users[] | select(.active == true)'

# IDs of users with >100 commits

jq '.users[] | select(.commits > 100) | .id'

# Pods in CrashLoopBackOff state

kubectl get pods -o json | jq '

  .items[]

  | select(.status.containerStatuses[]?.state.waiting.reason == "CrashLoopBackOff")

  | .metadata.name'

? after a field means: if the key does not exist, skip it instead of failing.

Transformation

bash

# Only name and email of each user, as an array of objects

jq '.users | map({name, email})'

# From an array of objects to flat TSV

jq -r '.users[] | [.id, .name, .email] | @tsv'

# From an array to CSV with a header row

jq -r '(.users[0] | keys_unsorted), (.users[] | [.[]]) | @csv'

# Grouping

jq 'group_by(.team) | map({team: .[0].team, count: length})'

Formatters at the end of a filter: @tsv, @csv, @sh (escape for bash), @json, @uri, @base64, @base64d.

-r and -c

-r (raw output) strips JSON quotes from strings. Without -r you get "foo"; with -r you get foo. Use it when passing output to the shell.
-c (compact) puts one object per line with no newlines inside. Useful for NDJSON logs and xargs.
-s (slurp) reads all stdin as a single array. By default jq reads a stream of JSON documents.

bash

# NDJSON: one object per line

cat events.jsonl | jq -c 'select(.severity=="ERROR")'

# Extract IPs for xargs

jq -r '.hits[].ip' alerts.json | sort -u | xargs -I{} whois {}

Variables and parameters

bash

# Pass a value from the shell

jq --arg user "$USER" '.users[] | select(.login == $user)' data.json

# Numeric (not a string)

jq --argjson min 100 '.events[] | select(.duration_ms > $min)' data.json

Without --arg, shell substitutions inside a filter are a common source of injection bugs. Do not write jq ".x == \"$VAR\"". Write jq --arg v "$VAR" '.x == $v' instead.

Reduce, foreach, paths

For aggregation:

bash

# Sum the size field

jq '[.files[].size] | add'

jq '.files | reduce .[] as $f (0; . + $f.size)'

# All paths to leaf nodes (useful for exploring an unknown structure)

jq '[paths(scalars)]'

jq with kubectl, docker, and aws

All three CLIs support -o json or --format=json:

bash

# Nodes and their kubelet version

kubectl get nodes -o json | jq -r '.items[] | [.metadata.name, .status.nodeInfo.kubeletVersion] | @tsv'

# Containers by image

docker ps --format='{{json .}}' | jq -r 'select(.Image | contains("nginx")) | .Names'

# All S3 buckets tagged owner=team-x

aws s3api list-buckets | jq -r '.Buckets[].Name' \

  | xargs -I{} sh -c 'aws s3api get-bucket-tagging --bucket {} 2>/dev/null \

      | jq -r --arg b {} ".TagSet[] | select(.Key==\"owner\" and .Value==\"team-x\") | \"\($b)\""'

When things go wrong

jq: error: Cannot index ... with ... means you applied .field to a non-object (an array, null, or a number). Use ? or select(type=="object").
null in output instead of an error means ? is silencing errors. Remove ? to see where the failure occurs.
Quotes in output are in the way means you forgot -r.
Newlines inside values cause -r to emit literal \n characters, which can break a pipe into awk. Use @csv/@tsv or -c with a downstream parser.
Large file is slow means jq is loading the entire stream into memory because of -s. Without -s, jq streams. For gigabyte-scale files, look at gojq or jaq.
JSON5/JSONC (with comments) is not parsed by jq. Strip comments first with yq -p json or a preprocessor.

Alternatives

yq (Mike Farah): jq-compatible syntax for YAML, TOML, and XML
gojq: Go implementation, somewhat faster on large files
jaq: Rust implementation, faster still, not 100% compatible with all features
fx: interactive JSON explorer (TUI)
jless: jq combined with less, for paging through large JSON

Why jq

The alternative, python -c 'import json,sys;...', is harder to read in a pipe.

Basic syntax

jq [OPTIONS] FILTER [FILE...]

Without a file, jq reads stdin. The simplest filter is . (identity):

bash

curl -s api.example.com/users | jq .

This gives you colored pretty-print output. Often that is enough.

Selectors

bash

jq '.name'                  # field name

jq '.users[0]'              # first element of the array

jq '.users[]'               # ALL elements as a stream (not an array)

jq '.users[].email'         # email field of each element

jq '.users | length'        # length of the array

jq '.users | keys'          # keys of an object (or indices of an array)

jq '.["weird-key"]'         # keys with hyphens/spaces via []

jq '..|.email? // empty'    # recursive search for email anywhere in the tree

select(): filtering

bash

# Active users

jq '.users[] | select(.active == true)'

# IDs of users with >100 commits

jq '.users[] | select(.commits > 100) | .id'

# Pods in CrashLoopBackOff state

kubectl get pods -o json | jq '

  .items[]

  | select(.status.containerStatuses[]?.state.waiting.reason == "CrashLoopBackOff")

  | .metadata.name'

? after a field means: if the key does not exist, skip it instead of failing.

Transformation

bash

# Only name and email of each user, as an array of objects

jq '.users | map({name, email})'

# From an array of objects to flat TSV

jq -r '.users[] | [.id, .name, .email] | @tsv'

# From an array to CSV with a header row

jq -r '(.users[0] | keys_unsorted), (.users[] | [.[]]) | @csv'

# Grouping

jq 'group_by(.team) | map({team: .[0].team, count: length})'

Formatters at the end of a filter: @tsv, @csv, @sh (escape for bash), @json, @uri, @base64, @base64d.

-r and -c

-r (raw output) strips JSON quotes from strings. Without -r you get "foo"; with -r you get foo. Use it when passing output to the shell.
-c (compact) puts one object per line with no newlines inside. Useful for NDJSON logs and xargs.
-s (slurp) reads all stdin as a single array. By default jq reads a stream of JSON documents.

bash

# NDJSON: one object per line

cat events.jsonl | jq -c 'select(.severity=="ERROR")'

# Extract IPs for xargs

jq -r '.hits[].ip' alerts.json | sort -u | xargs -I{} whois {}

Variables and parameters

bash

# Pass a value from the shell

jq --arg user "$USER" '.users[] | select(.login == $user)' data.json

# Numeric (not a string)

jq --argjson min 100 '.events[] | select(.duration_ms > $min)' data.json

Without --arg, shell substitutions inside a filter are a common source of injection bugs. Do not write jq ".x == \"$VAR\"". Write jq --arg v "$VAR" '.x == $v' instead.

Reduce, foreach, paths

For aggregation:

bash

# Sum the size field

jq '[.files[].size] | add'

jq '.files | reduce .[] as $f (0; . + $f.size)'

# All paths to leaf nodes (useful for exploring an unknown structure)

jq '[paths(scalars)]'

jq with kubectl, docker, and aws

All three CLIs support -o json or --format=json:

bash

# Nodes and their kubelet version

kubectl get nodes -o json | jq -r '.items[] | [.metadata.name, .status.nodeInfo.kubeletVersion] | @tsv'

# Containers by image

docker ps --format='{{json .}}' | jq -r 'select(.Image | contains("nginx")) | .Names'

# All S3 buckets tagged owner=team-x

aws s3api list-buckets | jq -r '.Buckets[].Name' \

  | xargs -I{} sh -c 'aws s3api get-bucket-tagging --bucket {} 2>/dev/null \

      | jq -r --arg b {} ".TagSet[] | select(.Key==\"owner\" and .Value==\"team-x\") | \"\($b)\""'

When things go wrong

jq: error: Cannot index ... with ... means you applied .field to a non-object (an array, null, or a number). Use ? or select(type=="object").
null in output instead of an error means ? is silencing errors. Remove ? to see where the failure occurs.
Quotes in output are in the way means you forgot -r.
Newlines inside values cause -r to emit literal \n characters, which can break a pipe into awk. Use @csv/@tsv or -c with a downstream parser.
Large file is slow means jq is loading the entire stream into memory because of -s. Without -s, jq streams. For gigabyte-scale files, look at gojq or jaq.
JSON5/JSONC (with comments) is not parsed by jq. Strip comments first with yq -p json or a preprocessor.

Alternatives

yq (Mike Farah): jq-compatible syntax for YAML, TOML, and XML
gojq: Go implementation, somewhat faster on large files
jaq: Rust implementation, faster still, not 100% compatible with all features
fx: interactive JSON explorer (TUI)
jless: jq combined with less, for paging through large JSON

jq: JSON queries and transformation

Why jq

Basic syntax

Selectors

select(): filtering

Transformation

-r and -c

Variables and parameters

Reduce, foreach, paths

jq with kubectl, docker, and aws

When things go wrong

Alternatives

§ команды

§ см. также

jq: JSON queries and transformation

Why jq

Basic syntax

Selectors

select(): filtering

Transformation

-r and -c

Variables and parameters

Reduce, foreach, paths

jq with kubectl, docker, and aws

When things go wrong

Alternatives

§ команды

§ см. также