Why there are different drivers
Image layers are read-only. A container writes, so it needs a writable layer on top. Older filesystems could not do this efficiently, so Docker supported several storage drivers, each with its own approach.
Today overlay2 covers 95% of cases, but the rest are worth knowing. You will meet them in legacy systems and in production scenarios with special requirements.
Where all of this lives:
/var/lib/docker/
├── overlay2/ ← if driver = overlay2
│ ├── <layer-hash>/
│ │ ├── diff/ ← files of a specific layer
│ │ ├── lower ← list of lower layers
│ │ └── work/ ← overlayfs working directory
├── containers/
└── image/
Which driver is active now:
docker info | grep -i storage
# Storage Driver: overlay2
# Backing Filesystem: extfs
overlay2, the default
Uses the kernel's [[tmpfs-overlayfs|overlayfs]]. Image layers are the
read-only lower, container changes are the upper.
Pros:
- In the kernel, no userspace overhead
- Page cache is shared between containers with the same base image
- Fast startup, the overlay mount is cheap
Cons:
- Inode usage grows fast, copy-up at write copies whole files. A large file with small edits means double the space
- It does not like many layers, the kernel limit on nested overlay is around 128
Backing filesystem: ext4, xfs (with ftype=1), btrfs, ext4,
[[tmpfs-overlayfs|tmpfs]] (for rootless through a user namespace).
For XFS this is mandatory at mkfs time:
mkfs.xfs -f -n ftype=1 /dev/sdb1
Otherwise overlay2 refuses to work (d_type is missing).
btrfs, native CoW
Uses [[btrfs|btrfs]] subvolumes and snapshots:
- Each layer is a subvolume
- The container layer is a snapshot of the last image subvolume
- Writes inside the container are copy-on-write at the filesystem level
Pros:
- Snapshot in O(1)
- Deduplication at the level of FS blocks
btrfs scrubfinds bit rot
Cons:
- All of
/var/lib/dockermust be on btrfs - Btrfs does not like a near-full filesystem
- On a high-write workload, fragmentation hurts performance
- In 2026 it is not the default anywhere, a niche choice
# daemon.json
{ "storage-driver": "btrfs" }zfs, enterprise CoW
Similar to btrfs, but through ZFS:
- Each layer is a zfs filesystem
- A container is a clone of the last layer
- Internally,
zfs send/receivehandles migration
Pros:
- ZFS-grade reliability (ARC, scrubs, raid-z)
- Deduplication (if enabled)
- Snapshots are cheap
Cons:
- ZFS is not in the mainline kernel, you need ZoL (zfs-on-Linux), with the CDDL+GPL licensing friction
- RAM-hungry (ARC eats free memory)
- The production stack is rare, mostly TrueNAS and Solaris descendants
For containers this is usually overkill, except in special cases (a storage server).
devicemapper, legacy
Creates a thin pool on block devices, each layer is a separate thin LV.
It used to be the default on RHEL 7. Deprecated, removed in Docker 23+ (2023). Do not use it.
vfs, fallback
It simply copies files recursively between layers. No CoW.
- Huge disk usage
- Very slow pull
- Works on any filesystem
The default for rootless Docker without user-namespace overlay permissions. For production, never.
fuse-overlayfs, rootless
In rootless Docker/Podman, plain overlayfs does not work (it needs CAP_SYS_ADMIN to mount). The FUSE version does the same thing in userspace:
# Podman rootless by default
podman info | grep -i graphdriver
# graphDriverName: overlay
# ... mountopt: nodev,fsync=0
Modern kernels 5.11+ have rootless overlayfs through user namespaces,
and overlay works without FUSE, faster. But FUSE is needed for older kernels.
Comparison
| Driver | Backing | CoW level | Recommendation |
|---|---|---|---|
| overlay2 | ext4/xfs/btrfs | overlayfs (file) | default, 95% of cases |
| btrfs | btrfs | btrfs subvol/snapshot (block) | if you already run btrfs |
| zfs | zfs | zfs clone (block) | enterprise NAS |
| devicemapper | thin pool | LVM thin (block) | deprecated |
| vfs | any | none (full copy) | testing/fallback |
| fuse-overlayfs | any | userspace overlay | rootless legacy |
Performance: the large-write problem of overlay2
Overlay2 does a copy-up on write: on the first write to a file from a read-only layer, the whole file is copied into the writable layer. If a file is 10 GB and one byte changes, the copy is 10 GB.
For databases and VM images this is fatal. The fixes:
- A volume or bind mount for the data directory:
bashThe volume sits outside the overlay, with no copy-up.
docker run -v /var/lib/postgres-data:/var/lib/postgresql/data postgres
- chattr +C on the host directory if the backing is btrfs ([[btrfs|see this]])
- tmpfs for ephemeral directories:
bash
docker run --tmpfs /tmp:size=100M ...
Storage size limits
By default a container can take up the whole disk (until the FS is full). The limit:
docker run --storage-opt size=10G ...
Works only on storage drivers with a native quota:
- overlay2 + xfs+pquota, yes
- btrfs, yes through subvolume quota
- devicemapper, yes (thin pool)
- overlay2 + ext4, no
An alternative: docker-compose with storage_opt:, or k8s
ephemeral-storage requests/limits.
Cleaning up disk usage
docker system df # how much is used
docker image prune # unused images
docker container prune # stopped ones
docker volume prune # unmounted volumes
docker builder prune # build cache
docker system prune -a --volumes # EVERYTHING (dangerous, removes images)
# Targeted: which layers and their size
du -sh /var/lib/docker/overlay2/* | sort -h | tail
A large /var/lib/docker/overlay2/ is normal for CI runners.
Prune once a day is a must.
When things go wrong
failed to register layer: ApplyLayer exit status 1, usually the backing FS does not support overlay (vfat) or space ran out.device or resource busyon docker rm, the overlay mount is still held. Rundocker stopfirst, or kill the container's processes throughdocker top.d_type is not supportedon XFS made withoutftype=1. The only fix is to recreate the FS with the correct mkfs.No space left on devicewith free GB available means inodes ran out on ext4 (df -i). Overlay creates many small files. Recreate withmkfs.ext4 -i 4096.- A very slow pull, many small layers (
docker historywill show 50+). When you build, combine RUN commands. /var/lib/dockerballooned out of proportion, old dangling images, containers that exited but were not removed, build cache. Rundocker system df, then prune.- A container write to a shared volume "disappears", another container on the same volume wrote over it. A volume is not an overlay, the last writer wins.
Alternative runtimes and their storage
- podman, similar drivers (overlay, vfs); rootless =
overlaythrough a user namespace - containerd has its own "snapshotter" subsystem: native, overlayfs, btrfs, devmapper. The standard for k8s.
- CRI-O, overlay (default) or vfs