What it is
When RAM runs out, the kernel has a choice:
- Drop clean page cache pages (they can be reread from disk)
- Write dirty anonymous pages (a process heap) to disk and free the RAM. This is a swap-out
- If neither one freed space, then oom-killer
Swap does not make the system faster. It makes the system possible under "RAM is gone, but the work has to continue".
Swap partition vs swap file
It used to be a separate partition. On modern distros it is usually a file:
# Create a 4 GB swap file
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile # tight permissions are required
sudo mkswap /swapfile
sudo swapon /swapfile
# Make it permanent in /etc/fstab
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
A partition:
sudo mkswap /dev/sdb2
sudo swapon /dev/sdb2
# fstab: UUID=... none swap sw 0 0
Viewing it
swapon # active swap devices/files
cat /proc/swaps # the same, from the kernel
free -h # used/free/total columns for swap
vmstat 1 # si/so = swap-in/out in KB/s
cat /proc/meminfo | grep -i swap
The swapon format:
NAME TYPE SIZE USED PRIO
/swapfile file 4G 300M -2
/dev/sdb2 partition 8G 1.2G -3
PRIO is the priority. The higher the number, the sooner it is used. Several swaps with the same PRIO work like RAID-0.
vm.swappiness
cat /proc/sys/vm/swappiness # default 60 (on a laptop) or 30 (on a server)
- 0 means almost never swap anonymous pages; wait until RAM is exhausted
- 60 is the balance (desktop default)
- 100 means swap aggressively
On production database servers people often set 1-10 so the database heap does not go into swap (a database manages its own bufferpool better than the kernel does).
echo 'vm.swappiness = 10' | sudo tee /etc/sysctl.d/99-swappiness.conf
sudo sysctl --system
When swap helps
- Burst memory during imports. Once a week a 10GB job lands, RAM is 8GB, but the job is slow and rare, so let it swap
- Hibernate (suspend-to-disk). You need swap >= the size of RAM
- Protection from OOM-kill. Slow is better than killing a critical process
- VMs. A guest does not know what it actually gets; swap is insurance
When swap HURTS
- Latency-sensitive services. A page fault costs milliseconds and the SLA breaks. Better to OOM-kill and redeploy.
- Databases. They are managed by a bufferpool; the kernel does not know what matters
- Kubernetes container nodes. k8s assumes "no swap"; with swap the metrics and QoS break. Before k8s 1.22 swap must be disabled.
Thrashing
If a process actively uses more memory than RAM, the kernel swaps it back and forth endlessly. The signs:
vmstat 1shows highsi/so(tens of thousands of KB/s)topshows processes in stateD, and load-average goes off the chartiowaitis 90%+- The system is effectively unresponsive
The fix: kill the process, disable swap, set a cgroups limit, or add RAM.
Disabling
sudo swapoff -a # all swaps
sudo swapoff /swapfile # a specific one
sudo rm /swapfile # if it is a file, delete it
# And remove the line from /etc/fstab
swapoff can take minutes, because it first has to pull everything back into RAM.
zram: swap in compressed memory
An alternative: instead of swap on disk, use a compressed block of RAM. Fewer disk IOPS, faster, but it occupies RAM:
sudo modprobe zram
echo lz4 | sudo tee /sys/block/zram0/comp_algorithm
echo 2G | sudo tee /sys/block/zram0/disksize
sudo mkswap /dev/zram0
sudo swapon -p 100 /dev/zram0 # high priority, used first
The default on Fedora, ChromeOS, and many embedded distros.