linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Simulator
  • Knowledge base
  • Interview prep
Index
Categories
All entries
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
home/linux/kb/File system/btrfs

kb/filesystem ── File system ── advanced

btrfs: copy-on-write, subvolumes, and snapshots

btrfs is a copy-on-write filesystem with subvolumes, O(1) snapshots, native RAID 0/1/10, and data checksums. RAID 5/6 is problematic. COW fragmentation hurts databases and VM images, so turn it off for them.

view as markdownaka: btrfs-fs, btrfs-subvolume, btrfs-cow, btrfs-snapshot

Why btrfs

It appeared in 2009 as an answer to ZFS. It gave Linux:

  • Copy-on-write (COW): a write does not modify the block in place; it writes to a new location and updates the pointer
  • Subvolume: a named subdirectory you can mount, snapshot, and send on its own
  • O(1) snapshots: a cheap copy of a subvolume through clever COW
  • Data + metadata checksums: you see bit rot before it corrupts a file
  • Native RAID 0/1/10: without mdraid on top
  • Send/receive: incremental replication

It is the default on openSUSE Tumbleweed and Fedora Workstation 33+. On production servers it shows up less often because of the historical instability of RAID 5/6 and COW fragmentation.

Copy-on-write, the idea

When a block changes:

  • Non-COW (ext4, xfs): write to the same block
  • COW (btrfs, ZFS): write to a new block, update the pointer in the tree

Upsides:

  • O(1) snapshot: just a new name that points at the same blocks
  • Crash-safe without a journal: there is always a consistent point
  • Checksums are verified automatically on read

Downsides:

  • Fragmentation: frequent edits to one file scatter it across the disk. This hits databases (page-level edits) and VM images especially hard.
  • Space is unpredictable: one file takes more room when a block changes, because the old version may be held by a snapshot.

The fix for databases and VMs is chattr +C (disable COW for specific files):

bash
mkdir /data/postgres
chattr +C /data/postgres        # for all new files inside
cp -a /old/postgres/* /data/postgres/

chattr +C takes effect only when a file is created; it will not apply to existing files without recopying.

Subvolume

A subvolume is a separate tree inside one filesystem. It is not a partition; it is a logical unit of btrfs:

bash
mkfs.btrfs /dev/sdb1
mount /dev/sdb1 /mnt
btrfs subvolume create /mnt/@home
btrfs subvolume create /mnt/@var
btrfs subvolume list /mnt

Advantages:

  • You can mount a subvolume on its own (mount -o subvol=@home)
  • Its own quotas, its own snapshots
  • Isolated ENOSPC boundaries (optional, through quotas)

A common layout:

/mnt/             ← top-level
 ├─ @            ← root /
 ├─ @home        ← /home
 ├─ @var         ← /var
 └─ @snapshots   ← /.snapshots

In fstab:

fstab
UUID=...  /     btrfs  defaults,subvol=@,compress=zstd:1     0 0
UUID=...  /home btrfs  defaults,subvol=@home,compress=zstd:1 0 0

Snapshots

bash
# Snapshot of a subvolume
btrfs subvolume snapshot /mnt/@home /mnt/@snapshots/home-$(date +%F)
# Read-only snapshot (for backups through send)
btrfs subvolume snapshot -r /mnt/@home /mnt/@snapshots/home-ro
# Rollback: rename
btrfs subvolume delete /mnt/@home_broken
btrfs subvolume snapshot /mnt/@snapshots/home-2026-05-01 /mnt/@home

A snapshot is instant and takes 0 space at first. As edits accumulate, the original and the snapshot diverge, and space is spent on the difference.

Send / receive

Incremental replication to another btrfs filesystem:

bash
# Full send of the first snapshot
btrfs send /mnt/@snapshots/home-2026-05-01 | ssh remote 'btrfs receive /backup/'
# Incremental
btrfs send -p /mnt/@snapshots/home-2026-05-01 /mnt/@snapshots/home-2026-05-02 \
  | ssh remote 'btrfs receive /backup/'

It sends only the delta between snapshots, so backups are fast and compact. This is the better alternative to rsync for a large dataset.

RAID

btrfs does RAID by block, not by disk:

bash
mkfs.btrfs -d raid1 -m raid1 /dev/sdb /dev/sdc      # data + metadata mirror
mkfs.btrfs -d raid10 -m raid1 /dev/sd{b,c,d,e}      # data RAID10, meta mirror
mkfs.btrfs -d single -m raid1 /dev/sdb /dev/sdc     # data not mirrored, metadata is

Levels:

  • single: no redundancy
  • dup: two copies on one disk (the default for metadata on a single disk)
  • raid0: stripe with no redundancy
  • raid1: mirror (but N-way: 2 copies across N+ disks, not like mdraid)
  • raid1c3, raid1c4: 3-way and 4-way mirror
  • raid10: stripe of mirrors
  • raid5, raid6: historically unstable, the write hole. People avoided them until 2024. With newer kernels (6.x) the situation has improved, but multi-disk production setups still run more often on mdraid + ext4/xfs.

Add or remove a disk online:

bash
btrfs device add /dev/sdd /mnt
btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt
btrfs device remove /dev/sdc /mnt

Checksums and scrub

btrfs stores a crc32c (or blake2/sha256/xxhash) checksum per block. On read it checks. If there is no match and RAID is present, it reads from another disk and repairs on the fly.

A preventive check:

bash
btrfs scrub start /mnt
btrfs scrub status /mnt

A pass over the whole disk that verifies every block. On SAS/NVMe it runs in the background; on HDDs it is noticeable. Once a week or month from cron is normal.

Compression

fstab
... compress=zstd:1 ...

Transparent compression. Algorithms: zlib, lzo, zstd:1..15. zstd:1 is fast and saves about 30% on typical text data.

Files that are already written can be recompressed:

bash
btrfs filesystem defragment -r -czstd /mnt/data

When something goes wrong

  • No space left with free GB: btrfs allocates chunks separately for data and metadata; the metadata ran out. btrfs balance start -musage=50 /mnt repacks chunks. On a full filesystem even balance may not start, so delete at least something first.
  • The database is slow and fragments: COW was not turned off. Run chattr +C on the directory BEFORE the database files are created.
  • Snapshots ate all the space. Delete old ones: btrfs subvolume delete /mnt/@snapshots/old-*. Space is fully reclaimed after a balance.
  • parent transid verify failed: corruption. Use btrfs check (diagnostics only); for repair use btrfs check --repair (RISKY!), or better mount -o ro,recovery and rescue the data.
  • RAID 5/6 lost data after a crash: the known write-hole problem. Use RAID 1/10 for multi-disk.
  • The disk is 90% full and slow: btrfs does not like being near full. Keep it under 80%.

When to choose btrfs

Use caseBtrfs?
Workstation with auto-snapshots✓ (openSUSE/Snapper integration)
NAS without deduplication✗ (take ZFS instead)
NAS with snapshots and RAID 1/10✓
Database server✗ (ext4/xfs + LVM snapshot)
VM host with qcow2✗ or with chattr +C
Container host✓ (native overlayfs or snapshotting)
RAID 5/6 needed in production✗ (mdraid + ext4/xfs)

§ команды

bash
sudo mkfs.btrfs -L data -f /dev/sdb

Create btrfs on a single disk; -f forces it if a signature exists

bash
sudo btrfs subvolume snapshot -r /mnt/@home /mnt/@snap-$(date +%F)

Read-only snapshot, the basis for send/receive replication

bash
sudo btrfs scrub start /mnt && sudo btrfs scrub status /mnt

Run a full checksum check that finds bit rot before it does harm

bash
sudo btrfs filesystem usage /mnt

Real data/metadata usage, because df lies on btrfs

bash
sudo btrfs balance start -dusage=50 /mnt

Repack underfilled chunks to free up metadata allocation

bash
sudo btrfs send -p old.snap new.snap | ssh remote 'btrfs receive /backup'

Incremental send: back up only the delta between snapshots

bash
sudo chattr +C /data/postgres

Disable COW for a directory; do this before the database files are created!

§ см. также

  • filesystemsFilesystems: ext4, xfs, btrfs, zfsext4 is the reliable default. xfs handles large files and parallel I/O. btrfs and zfs give you snapshots, checksums, and built-in RAID, but they are more complex.
  • ext4ext4: the Linux filesystem workhorseext4 is the default filesystem on most distributions: journaling, extents, a fixed inode count set at mkfs time. The main tunes are the data mode, noatime, and lazy init. Stable for 15+ years. Does not scale like XFS.
  • xfsXFS: extents and parallel I/OXFS is the RHEL 7+ default: allocation groups (parallel I/O), extent-based allocation, online grow. **It cannot shrink**, grow only. Ideal for big files, databases, and parallel workloads.
  • lvmLVM: Logical Volume ManagerLVM is a layer between [[block-devices]] and the filesystem: it pools disks and carves out logical volumes of any size that you can grow, snapshot, and migrate live.
  • fsck-and-recoveryfsck and recovery: checking and repairing a filesystemfsck, a check of an unmounted filesystem. e2fsck (ext), xfs_repair (XFS), btrfs check (btrfs). Journal replay at mount handles 90% of problems after a crash.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies