linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Simulator
  • Knowledge base
  • Interview prep
Index
Categories
All entries
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
home/linux/kb/Commands/cmd-awk

kb/commands ── Commands ── intermediate

awk: field-oriented processing of structured text

awk splits a line into fields by FS (default is whitespace) and applies pattern { action }. `$1..$NF`, `NR` (a counter), BEGIN/END for a prologue and totals. It covers 80% of "process the columns" tasks without Python.

view as markdownaka: awk, gawk, mawk, awk-command

Why awk

When text is laid out in columns (logs, df, ps, CSV), awk splits it into fields and gives you the usual numeric operations. [[cmd-sed|sed]] is strong line by line; awk is strong at fields plus counters plus aggregation. The line between awk and Python: up to 30 lines of awk script or a table of 5 columns, and awk wins.

On Linux you usually have gawk (GNU awk) or mawk (faster, more minimal). The POSIX subset works everywhere. GNU extensions (gensub, multi-dim arrays via ;, asort) are gawk only.

Basic syntax

awk 'pattern { action }' file
awk -F: '{ print $1 }' /etc/passwd
awk -f script.awk file
  • pattern is a regex /foo/, a numeric condition $3 > 100, BEGIN, END, or a combination with &&/||
  • action is a { ... } block. A pattern with no action defaults to { print }. An action with no pattern runs on every line.

Fields and built-in variables

VariableMeaning
$0the whole line
$1, $2, ... $NFfields 1..N
NFnumber of fields in the current line
NRnumber of the current line (overall counter)
FNRline number within the current file (for multi-file)
FSfield separator on read (default: spaces/tabs)
OFSseparator when printing with print a,b,c
RSrecord separator (default \n)
ORSrecord separator on output
FILENAMEname of the current file
bash
# Who is logged in and which shell
awk -F: '{ print $1, $7 }' /etc/passwd
# Top 10 IPs by request count in access.log
awk '{ print $1 }' access.log | sort | uniq -c | sort -rn | head
# Sum of file sizes
ls -l | awk '{ sum += $5 } END { print sum/1024/1024 " MiB" }'

BEGIN and END

  • BEGIN { ... } runs before the first line is read. Common uses: setting FS, initializing variables, a report header.
  • END { ... } runs after the last line. The final summary.
bash
awk 'BEGIN { FS=":"; print "user\tshell" }
     { print $1 "\t" $7 }
     END { print "total:", NR }' /etc/passwd

Conditions and arithmetic

bash
# 5xx requests from nginx
awk '$9 >= 500 && $9 < 600' access.log
# Failed SSH logins for today
awk -v d="$(date +%b\ %d)" '$0 ~ d && /Failed password/' /var/log/auth.log
# Convert bytes to MiB in `ls -l`
awk '{ printf "%-30s %.2f MiB\n", $9, $5/1024/1024 }' <(ls -l)

Awk is strictly typed between number and string by context: $1 + 0 forces a number, $1 "" forces a string.

Associative arrays

This is the main thing awk does more easily than the shell:

bash
# Top 5 IP addresses by 5xx count
awk '$9 >= 500 { count[$1]++ }
     END { for (ip in count) print count[ip], ip }' access.log \
  | sort -rn | head -5
# Sum of bytes per user-agent
awk -F'"' '{ ua=$6; bytes[ua] += $NF } END { for (k in bytes) print bytes[k], k }' access.log

A simple report

awk
# report.awk
BEGIN { FS=","; OFS="\t"; print "host", "errors", "avg_ms" }
{ errs[$1] += $2; ms[$1] += $3; cnt[$1]++ }
END   { for (h in errs) print h, errs[h], ms[h]/cnt[h] }

To run it:

bash
awk -f report.awk metrics.csv | sort -k2 -rn

awk vs sed vs jq

TaskTool
Replace a pattern in a linecmd-sed
One or two columns plus a filterawk
Aggregation by keyawk
JSON[[cmd-jq
Multiline structuresPython
XMLXSLT / Python

When something goes wrong

  • $10 does not work means that in awk $10 is the tenth field, not the first plus "0". But in print $1 0 it is concatenation. Parentheses fix it: print $1, $1+10.
  • Fields shifted because of spaces inside values happens because the default FS = [ \t]+ is greedy. Use -F'\t' for strict TSV, or awk -F'"' ... for CSV-with-quotes (but honest CSV parsing belongs to Python).
  • A pattern with \d matches nothing because POSIX awk does not know PCRE. \d does not work; write [0-9].
  • gawk vs mawk: gensub, asort, and the third arg in match() are GNU only. If a script breaks on Alpine (which ships mawk), check this.
  • stdin input got muddled: awk '{...}' < file is fine, and cat file | awk is fine too, but awk < file '{...}' is a syntax error.

§ команды

bash
awk -F: '$3 >= 1000 { print $1 }' /etc/passwd

Users with UID ≥ 1000 are usually real people, not systemd users

bash
awk '{ print $1 }' access.log | sort | uniq -c | sort -rn | head

Top 10 client IPs in an HTTP log, the classic one-liner

bash
awk 'NR==FNR { a[$1]=1; next } !($1 in a)' allow.txt all.txt

Lines in all.txt that are missing from allow.txt, a diff on the first field

bash
awk '/start/,/end/' file

Extract the block between markers, an address range

bash
awk 'END { print NR }' file

Counts lines. No faster than `wc -l`, but it adds no dependency

bash
ps -eo pid,user,rss,cmd | awk 'NR>1 { mem[$2] += $3 } END { for (u in mem) print mem[u]/1024 " MiB", u }' | sort -rn | head

How much RAM each user eats, straight from ps in one pipe

bash
awk -v t=$(date +%s) '$1 > t-3600' events.tsv

Events from the last hour. `-v` passes a shell variable through

§ см. также

  • cmd-sedsed: stream editorsed is a stream editor: it applies commands (`s/a/b/`, `d`, `p`, ...) to each line. `-i` edits a file in place; `-E` enables ERE; the address range `/start/,/end/` filters a block. Hold space is a second buffer.
  • cmd-grepgrep: search lines by pattern`grep` searches stdin or files for lines matching a regex. Key modes: `-E` (ERE), `-P` (PCRE), `-F` (fixed string), `-r` (recursive tree walk).
  • bash-scriptingbash scripts: basics and idiomsA bash script is a text file with shebang `#!/usr/bin/env bash` and `chmod +x`. Start every script with `set -euo pipefail` and run `shellcheck` to catch errors early.
  • cmd-jqjq: JSON queries and transformationjq is a query language for JSON in the shell. Use .field, .array[], select(...), map(...), and in-expression pipes via |. -r strips quotes, -c packs output into a single line. Works well in curl + jq + grep pipelines.
  • xargs-and-find-execxargs and find -exec: bulk operationsTwo ways to apply a command to a set of files: `find ... -exec cmd {} +` (inside find) and `... | xargs cmd` (via pipe). For safety with spaces and special characters, use `find -print0 | xargs -0`.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies