Swap: when RAM runs out: swap | LinuxLab

What it is

When RAM runs out, the kernel has a choice:

Drop clean page cache pages (they can be reread from disk)
Write dirty anonymous pages (a process heap) to disk and free the RAM. This is a swap-out
If neither one freed space, then oom-killer

Swap does not make the system faster. It makes the system possible under "RAM is gone, but the work has to continue".

Swap partition vs swap file

It used to be a separate partition. On modern distros it is usually a file:

bash

# Create a 4 GB swap file

sudo fallocate -l 4G /swapfile

sudo chmod 600 /swapfile                # tight permissions are required

sudo mkswap /swapfile

sudo swapon /swapfile

# Make it permanent in /etc/fstab

echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

A partition:

bash

sudo mkswap /dev/sdb2

sudo swapon /dev/sdb2

# fstab: UUID=... none swap sw 0 0

Viewing it

bash

swapon                                  # active swap devices/files

cat /proc/swaps                         # the same, from the kernel

free -h                                  # used/free/total columns for swap

vmstat 1                                # si/so = swap-in/out in KB/s

cat /proc/meminfo | grep -i swap

The swapon format:

NAME       TYPE       SIZE  USED  PRIO

/swapfile  file       4G    300M  -2

/dev/sdb2  partition  8G    1.2G  -3

PRIO is the priority. The higher the number, the sooner it is used. Several swaps with the same PRIO work like RAID-0.

vm.swappiness

bash

cat /proc/sys/vm/swappiness     # default 60 (on a laptop) or 30 (on a server)

0 means almost never swap anonymous pages; wait until RAM is exhausted
60 is the balance (desktop default)
100 means swap aggressively

On production database servers people often set 1-10 so the database heap does not go into swap (a database manages its own bufferpool better than the kernel does).

bash

echo 'vm.swappiness = 10' | sudo tee /etc/sysctl.d/99-swappiness.conf

sudo sysctl --system

When swap helps

Burst memory during imports. Once a week a 10GB job lands, RAM is 8GB, but the job is slow and rare, so let it swap
Hibernate (suspend-to-disk). You need swap >= the size of RAM
Protection from OOM-kill. Slow is better than killing a critical process
VMs. A guest does not know what it actually gets; swap is insurance

When swap HURTS

Latency-sensitive services. A page fault costs milliseconds and the SLA breaks. Better to OOM-kill and redeploy.
Databases. They are managed by a bufferpool; the kernel does not know what matters
Kubernetes container nodes. k8s assumes "no swap"; with swap the metrics and QoS break. Before k8s 1.22 swap must be disabled.

Thrashing

If a process actively uses more memory than RAM, the kernel swaps it back and forth endlessly. The signs:

vmstat 1 shows high si/so (tens of thousands of KB/s)
top shows processes in state D, and load-average goes off the chart
iowait is 90%+
The system is effectively unresponsive

The fix: kill the process, disable swap, set a cgroups limit, or add RAM.

Disabling

bash

sudo swapoff -a                          # all swaps

sudo swapoff /swapfile                    # a specific one

sudo rm /swapfile                         # if it is a file, delete it

# And remove the line from /etc/fstab

swapoff can take minutes, because it first has to pull everything back into RAM.

zram: swap in compressed memory

An alternative: instead of swap on disk, use a compressed block of RAM. Fewer disk IOPS, faster, but it occupies RAM:

bash

sudo modprobe zram

echo lz4 | sudo tee /sys/block/zram0/comp_algorithm

echo 2G | sudo tee /sys/block/zram0/disksize

sudo mkswap /dev/zram0

sudo swapon -p 100 /dev/zram0           # high priority, used first

The default on Fedora, ChromeOS, and many embedded distros.

What it is

When RAM runs out, the kernel has a choice:

Drop clean page cache pages (they can be reread from disk)
Write dirty anonymous pages (a process heap) to disk and free the RAM. This is a swap-out
If neither one freed space, then oom-killer

Swap does not make the system faster. It makes the system possible under "RAM is gone, but the work has to continue".

Swap partition vs swap file

It used to be a separate partition. On modern distros it is usually a file:

bash

# Create a 4 GB swap file

sudo fallocate -l 4G /swapfile

sudo chmod 600 /swapfile                # tight permissions are required

sudo mkswap /swapfile

sudo swapon /swapfile

# Make it permanent in /etc/fstab

echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

A partition:

bash

sudo mkswap /dev/sdb2

sudo swapon /dev/sdb2

# fstab: UUID=... none swap sw 0 0

Viewing it

bash

swapon                                  # active swap devices/files

cat /proc/swaps                         # the same, from the kernel

free -h                                  # used/free/total columns for swap

vmstat 1                                # si/so = swap-in/out in KB/s

cat /proc/meminfo | grep -i swap

The swapon format:

NAME       TYPE       SIZE  USED  PRIO

/swapfile  file       4G    300M  -2

/dev/sdb2  partition  8G    1.2G  -3

PRIO is the priority. The higher the number, the sooner it is used. Several swaps with the same PRIO work like RAID-0.

vm.swappiness

bash

cat /proc/sys/vm/swappiness     # default 60 (on a laptop) or 30 (on a server)

0 means almost never swap anonymous pages; wait until RAM is exhausted
60 is the balance (desktop default)
100 means swap aggressively

On production database servers people often set 1-10 so the database heap does not go into swap (a database manages its own bufferpool better than the kernel does).

bash

echo 'vm.swappiness = 10' | sudo tee /etc/sysctl.d/99-swappiness.conf

sudo sysctl --system

When swap helps

Burst memory during imports. Once a week a 10GB job lands, RAM is 8GB, but the job is slow and rare, so let it swap
Hibernate (suspend-to-disk). You need swap >= the size of RAM
Protection from OOM-kill. Slow is better than killing a critical process
VMs. A guest does not know what it actually gets; swap is insurance

When swap HURTS

Latency-sensitive services. A page fault costs milliseconds and the SLA breaks. Better to OOM-kill and redeploy.
Databases. They are managed by a bufferpool; the kernel does not know what matters
Kubernetes container nodes. k8s assumes "no swap"; with swap the metrics and QoS break. Before k8s 1.22 swap must be disabled.

Thrashing

If a process actively uses more memory than RAM, the kernel swaps it back and forth endlessly. The signs:

vmstat 1 shows high si/so (tens of thousands of KB/s)
top shows processes in state D, and load-average goes off the chart
iowait is 90%+
The system is effectively unresponsive

The fix: kill the process, disable swap, set a cgroups limit, or add RAM.

Disabling

bash

sudo swapoff -a                          # all swaps

sudo swapoff /swapfile                    # a specific one

sudo rm /swapfile                         # if it is a file, delete it

# And remove the line from /etc/fstab

swapoff can take minutes, because it first has to pull everything back into RAM.

zram: swap in compressed memory

An alternative: instead of swap on disk, use a compressed block of RAM. Fewer disk IOPS, faster, but it occupies RAM:

bash

sudo modprobe zram

echo lz4 | sudo tee /sys/block/zram0/comp_algorithm

echo 2G | sudo tee /sys/block/zram0/disksize

sudo mkswap /dev/zram0

sudo swapon -p 100 /dev/zram0           # high priority, used first

The default on Fedora, ChromeOS, and many embedded distros.

Swap: when RAM runs out

What it is

Swap partition vs swap file

Viewing it

vm.swappiness

When swap helps

When swap HURTS

Thrashing

Disabling

zram: swap in compressed memory

§ команды

§ см. также

Swap: when RAM runs out

What it is

Swap partition vs swap file

Viewing it

vm.swappiness

When swap helps

When swap HURTS

Thrashing

Disabling

zram: swap in compressed memory

§ команды

§ см. также