cgroups (v2): cgroups | LinuxLab

What a cgroup is

A control group is a set of processes that share resource limits (CPU, RAM, I/O, PIDs, network) and shared accounting of consumption.

In cgroups v2 (the standard on modern Ubuntu/Debian/Fedora) there is one directory hierarchy under /sys/fs/cgroup/. Each directory is a cgroup. Subdirectories are child cgroups, and they inherit the parent's limits.

Each process belongs to exactly one cgroup. To find yours:

bash

cat /proc/self/cgroup

# 0::/system.slice/docker-abc123.scope

The full path is /sys/fs/cgroup plus the value from the file.

Controllers

Within a single hierarchy you enable "controllers", the modules that do accounting and apply limits:

cpu: share / quota; cpu.max = quota period (e.g. 50000 100000 = half a CPU)
memory: memory.max = hard limit, memory.high = soft (throttle with reclaim)
io: bandwidth and iops per device (io.max)
pids: pids.max (how many processes are allowed)
cpuset: pinning to specific cores and NUMA nodes

You enable them through cgroup.subtree_control:

bash

cat /sys/fs/cgroup/cgroup.controllers       # available

cat /sys/fs/cgroup/cgroup.subtree_control   # enabled on children

What Docker / k8s / systemd write here

When you run docker run --cpus=0.5 --memory=256m:

Docker creates /sys/fs/cgroup/system.slice/docker-<id>.scope/
Writes cpu.max = "50000 100000" (50ms out of 100ms)
Writes memory.max = 268435456
Places the container init PID into cgroup.procs

A k8s pod with resources: limits: { cpu: 500m, memory: 256Mi } does the same through kubelet → cri-o/containerd → kernel.

A systemd unit with MemoryMax=512M does the same thing, only through slice/scope units.

PSI: Pressure Stall Information

The most useful addition in v2 is the files cpu.pressure, memory.pressure, and io.pressure. They show what percentage of time a process waited for a resource. PSI is more accurate than load-average because it is normalized and works per-cgroup, which matters inside containers.

some avg10=12.34 avg60=8.90 avg300=4.50 total=...

full avg10=2.10  avg60=1.80 avg300=0.90 total=...

some = at least one process waited; full = ALL processes waited.

OOM in a cgroup

When a process in a cgroup hits memory.max, the oom-killer fires, but only within that cgroup. The rest of the system is unaffected.

§ команды

bash

cat /proc/self/cgroup

Which cgroup the current process is in

bash

MY=$(awk -F: '{print $3}' /proc/self/cgroup); ls /sys/fs/cgroup$MY

Which controllers are available for our cgroup

bash

cat /sys/fs/cgroup/cpu.max

CPU limit of the current root cgroup: `<quota> <period>` µs or `max`

bash

cat /sys/fs/cgroup/memory.current

How much RAM the cgroup uses right now (bytes)

bash

cat /sys/fs/cgroup/cpu.pressure

PSI: the precise per-cgroup metric for how short a resource is

What a cgroup is

A control group is a set of processes that share resource limits (CPU, RAM, I/O, PIDs, network) and shared accounting of consumption.

Each process belongs to exactly one cgroup. To find yours:

bash

cat /proc/self/cgroup

# 0::/system.slice/docker-abc123.scope

The full path is /sys/fs/cgroup plus the value from the file.

Controllers

Within a single hierarchy you enable "controllers", the modules that do accounting and apply limits:

cpu: share / quota; cpu.max = quota period (e.g. 50000 100000 = half a CPU)
memory: memory.max = hard limit, memory.high = soft (throttle with reclaim)
io: bandwidth and iops per device (io.max)
pids: pids.max (how many processes are allowed)
cpuset: pinning to specific cores and NUMA nodes

You enable them through cgroup.subtree_control:

bash

cat /sys/fs/cgroup/cgroup.controllers       # available

cat /sys/fs/cgroup/cgroup.subtree_control   # enabled on children

What Docker / k8s / systemd write here

When you run docker run --cpus=0.5 --memory=256m:

Docker creates /sys/fs/cgroup/system.slice/docker-<id>.scope/
Writes cpu.max = "50000 100000" (50ms out of 100ms)
Writes memory.max = 268435456
Places the container init PID into cgroup.procs

A k8s pod with resources: limits: { cpu: 500m, memory: 256Mi } does the same through kubelet → cri-o/containerd → kernel.

A systemd unit with MemoryMax=512M does the same thing, only through slice/scope units.

PSI: Pressure Stall Information

some avg10=12.34 avg60=8.90 avg300=4.50 total=...

full avg10=2.10  avg60=1.80 avg300=0.90 total=...

some = at least one process waited; full = ALL processes waited.

OOM in a cgroup

When a process in a cgroup hits memory.max, the oom-killer fires, but only within that cgroup. The rest of the system is unaffected.

cgroups (v2)

What a cgroup is

Controllers

What Docker / k8s / systemd write here

PSI: Pressure Stall Information

OOM in a cgroup

§ команды

§ см. также

§ упоминается в уроках

cgroups (v2)

What a cgroup is

Controllers

What Docker / k8s / systemd write here

PSI: Pressure Stall Information

OOM in a cgroup

§ команды

§ см. также

§ упоминается в уроках