What a cgroup is
A control group is a set of processes that share resource limits (CPU, RAM, I/O, PIDs, network) and shared accounting of consumption.
In cgroups v2 (the standard on modern Ubuntu/Debian/Fedora) there is one
directory hierarchy under /sys/fs/cgroup/. Each directory is a cgroup.
Subdirectories are child cgroups, and they inherit the parent's limits.
Each process belongs to exactly one cgroup. To find yours:
cat /proc/self/cgroup
# 0::/system.slice/docker-abc123.scope
The full path is /sys/fs/cgroup plus the value from the file.
Controllers
Within a single hierarchy you enable "controllers", the modules that do accounting and apply limits:
cpu: share / quota;cpu.max=quota period(e.g.50000 100000= half a CPU)memory:memory.max= hard limit,memory.high= soft (throttle with reclaim)io: bandwidth and iops per device (io.max)pids:pids.max(how many processes are allowed)cpuset: pinning to specific cores and NUMA nodes
You enable them through cgroup.subtree_control:
cat /sys/fs/cgroup/cgroup.controllers # available
cat /sys/fs/cgroup/cgroup.subtree_control # enabled on children
What Docker / k8s / systemd write here
When you run docker run --cpus=0.5 --memory=256m:
- Docker creates
/sys/fs/cgroup/system.slice/docker-<id>.scope/ - Writes
cpu.max = "50000 100000"(50ms out of 100ms) - Writes
memory.max = 268435456 - Places the container init PID into
cgroup.procs
A k8s pod with resources: limits: { cpu: 500m, memory: 256Mi } does the same
through kubelet → cri-o/containerd → kernel.
A systemd unit with MemoryMax=512M does the same thing, only through
slice/scope units.
PSI: Pressure Stall Information
The most useful addition in v2 is the files cpu.pressure, memory.pressure,
and io.pressure. They show what percentage of time a process waited for a
resource. PSI is more accurate than load-average because it is normalized and
works per-cgroup, which matters inside containers.
some avg10=12.34 avg60=8.90 avg300=4.50 total=...
full avg10=2.10 avg60=1.80 avg300=0.90 total=...
some = at least one process waited; full = ALL processes waited.
OOM in a cgroup
When a process in a cgroup hits memory.max, the oom-killer fires,
but only within that cgroup. The rest of the system is unaffected.