linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Simulator
  • Knowledge base
  • Interview prep
Index
Categories
All entries
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
home/linux/kb/Processes & resources/capabilities

kb/processes ── Processes & resources ── advanced

Linux capabilities: privilege bits

Capabilities split root's power into 40+ independent bits: NET_ADMIN, SYS_PTRACE, and so on. You can grant a process a slice of that power without making it full root.

view as markdownaka: caps, linux-caps, cap-add, cap-drop

Why

Historically root was all-powerful and a plain user could do nothing. That is all-or-nothing. You want to run ping, so you need a raw socket, so you need root, but then the program can do EVERYTHING.

Capabilities split that "omnipotence" into about 40 bits, each one a specific right. A program needs only CAP_NET_RAW to create an ICMP socket, not the other 39 bits. In containers this matters a lot. --cap-drop=ALL plus --cap-add=NET_BIND_SERVICE gives you a nearly safe root.

The most common capabilities

CAPWhat it allows
NET_ADMINip, tc, iptables, nft, network sysctls
NET_RAWraw sockets (ping, tcpdump)
NET_BIND_SERVICEbind to ports <1024 (80, 443, 22)
SYS_ADMINmount, umount, swapon, and a pile of other things, almost like root
SYS_PTRACEptrace other processes (gdb, strace for other users)
SYS_TIMEchange the system clock
SYS_NICEpriority, real-time scheduling
SYS_RESOURCEraise ulimit values
CHOWNchange file ownership
DAC_OVERRIDEbypass file-permissions on read and write
DAC_READ_SEARCHthe same, read only
SETUID / SETGIDchange the process UID/GID
KILLsend signals to any process
MKNODcreate device nodes
AUDIT_WRITEwrite to the audit log (needed for login)
BPFload eBPF programs (newer)
PERFMONperf without root (with CONFIG_BPF)

For the full list, read man 7 capabilities or run capsh --print.

Where a process keeps its capabilities

A process holds 5 sets of capabilities (as bitmasks in task_struct):

  • Permitted (P): what it MAY acquire
  • Effective (E): what is ACTIVE right now
  • Inheritable (I): what an exec'd program inherits
  • Bounding (B): the ceiling. Not even escalation can cross this set.
  • Ambient (A): inherited across exec without setuid (newer, for non-root users)
bash
cat /proc/self/status | grep ^Cap
# CapInh:	0000000000000000
# CapPrm:	0000003fffffffff   ← all 40 bits = root
# CapEff:	0000003fffffffff
# CapBnd:	0000003fffffffff
# CapAmb:	0000000000000000
capsh --print                              # readable format
capsh --decode=0000003fffffffff

A plain user has all zeros. Root has all ones. A container started with --cap-drop=ALL plus a few --cap-add shows a specific bitmask with those bits set.

File capabilities

A binary can carry capabilities in an xattr. At exec it then gets them without setuid:

bash
# Give /usr/bin/ping the CAP_NET_RAW right (how modern distros do it)
sudo setcap cap_net_raw+ep /usr/bin/ping
getcap /usr/bin/ping
# /usr/bin/ping = cap_net_raw+ep
# Remove
sudo setcap -r /usr/bin/ping

This is safer than SUID-root: the binary gets only the bit it needs, not full root privileges.

In Docker / containers

Docker gives a limited default set of caps out of the box:

CAP_AUDIT_WRITE, CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FOWNER, CAP_FSETID,
CAP_KILL, CAP_MKNOD, CAP_NET_BIND_SERVICE, CAP_NET_RAW, CAP_SETFCAP,
CAP_SETGID, CAP_SETPCAP, CAP_SETUID, CAP_SYS_CHROOT

Control them like this:

bash
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE nginx     # minimum
docker run --cap-add=NET_ADMIN ubuntu                           # for tc/iptables
docker run --cap-add=SYS_PTRACE ubuntu                          # for strace
docker run --privileged ubuntu                                  # ALL caps + more (NOT safe)

In Kubernetes, use securityContext.capabilities:

yaml
securityContext:
  capabilities:
    drop: ["ALL"]
    add: ["NET_BIND_SERVICE"]

Debugging "Operation not permitted"

When a program fails with EPERM:

  1. cat /proc/<pid>/status | grep ^CapEff shows what the process has
  2. capsh --decode=<value> decodes it
  3. Compare against what the operation needs (man 7 capabilities)
  4. Inside a container, add it with --cap-add

CAP_SYS_ADMIN: the "new root"

For legacy reasons a great many operations require exactly SYS_ADMIN (mount, namespaces, BPF, MAC labels). It is root in practice. A container with CAP_SYS_ADMIN is unsafe almost by design.

This is a known problem, which is why CAP_BPF and CAP_PERFMON appeared. They carve specific rights out of SYS_ADMIN into their own bits.

§ команды

bash
cat /proc/self/status | grep ^Cap

Current capabilities in hex: Permitted/Effective/Bounding/Ambient/Inherited

bash
capsh --print

Readable dump of the current process capabilities

bash
sudo getcap -r / 2>/dev/null

Every binary in the system with file capabilities, a security audit

bash
sudo setcap cap_net_bind_service+ep /usr/local/bin/myapp

Give your own binary the right to listen on a port below 1024 without root

bash
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE nginx

Least-privilege principle: only what you need

§ см. также

  • process-and-pidProcess and PIDA process is a running program with its own PID, memory, open descriptors, and UID. Every process forms a tree rooted at init (PID 1).
  • namespacesLinux namespacesNamespaces are a kernel mechanism that gives a process its own isolated view of a resource (network, mount points, PID, UID, IPC, hostname, time). Every container is built on them.
  • file-permissionsFile permissions: rwx and chmodEvery file has three permission sets: for the owner, the group, and others. Each set is three bits: read (r), write (w), execute (x). You change them with `chmod`.
  • extended-attributesExtended attributes (xattr): arbitrary file metadataxattr are key-value metadata on an inode beyond stat. 4 namespaces: user (open), trusted (root), system (ACL), security (SELinux, capabilities). getfattr reads, setfattr writes.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies