linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Simulator
  • Knowledge base
  • Interview prep
Index
Categories
All entries
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
home/linux/kb/File system/sparse-files

kb/filesystem ── File system ── beginner

Sparse files: holes and apparent size

A sparse file has "holes", blocks the filesystem never allocated. They read back as zeros but take no space. ls shows the apparent size, du shows the real one. Used in qcow2, backups, sparse loop.

view as markdownaka: sparse-file, file-holes, apparent-size

Why sparse

When you need a file "sized" at 100 GB that will fill up gradually, you don't have to allocate the full 100 GB on disk up front. You can create an empty file with a logical size of 100 GB but physically 0 bytes. As writes come in, the filesystem allocates blocks.

Where this is used:

  • qcow2/vmdk for VMs, a "thin" virtual disk
  • loop image for a filesystem, a 10 GB file with ext4 inside that really uses 1 GB
  • databases with pre-allocated tablespace (Oracle, MS SQL)
  • disk backups with empty regions, ddrescue
  • sparse logfile, a rewindable ring buffer

How holes are made

Three ways:

1. Seek + write across a boundary

bash
dd if=/dev/zero of=big.img bs=1 count=0 seek=10G

Creates a file with logical size 10 GB that occupies 0 blocks. The filesystem does not write zeros. It just records "there is a hole up to position X".

2. truncate / ftruncate

bash
truncate -s 10G big.img

The same thing in one command. It grows the inode length without allocating.

3. Removing blocks from an existing file (FALLOC_FL_PUNCH_HOLE)

bash
fallocate -p -o 1G -l 1G existing.dat

Drop bytes 1-2 GB from the middle of the file, punching a hole. The logical size stays the same, physical usage drops.

ls / stat / du: who shows what

bash
$ truncate -s 10G big.img
$ ls -lh big.img
-rw-r--r-- 1 user user 10G May  2 15:00 big.img       ← apparent (logical)
$ du -h big.img
0       big.img                                       ← actual (allocated)
$ stat big.img
  Size: 10737418240   Blocks: 0      IO Block: 4096  regular empty file
  • ls -l shows the apparent size, what seek SEEK_END returns
  • du shows disk usage in kilobyte units
  • du --apparent-size or du -k --apparent-size gives the apparent size
  • stat shows both: Size: (apparent) and Blocks: (×512 = bytes)

If only 5 GB is free on disk but the filesystem "sees" files totaling 20 GB, that is normal for sparse, but dangerous: as holes fill in, you can hit ENOSPC inside write().

fallocate vs sparse

Sparse means unallocated blocks.

fallocate (without -p) does the opposite. It reserves blocks without writing zeros:

bash
fallocate -l 10G allocated.dat

The file "occupies" 10 GB on disk, but the contents are undefined garbage (the kernel does not zero them). This speeds up the case "we will write 10 GB sequentially":

  • protection against fragmentation, the blocks are laid out contiguously
  • a guarantee that write() will not hit ENOSPC

If the filesystem supports it, the allocation is instant (no zeros written). On ext4/xfs, yes. On fat, no (zeros are always written).

fallocate options:

OptionWhat it does
-l SIZEsize
-o OFFSEToffset
-pFALLOC_FL_PUNCH_HOLE, punch a hole
-zFALLOC_FL_ZERO_RANGE, zero a range, possibly sparse
-dFALLOC_FL_DIG_HOLES, find zero blocks and turn them into holes
-cFALLOC_FL_COLLAPSE_RANGE, remove and shift
-iFALLOC_FL_INSERT_RANGE, insert and shift

fallocate -d compacts an existing file, turning zero regions into holes:

bash
fallocate -d disk.img

SEEK_HOLE / SEEK_DATA

Modern filesystems (ext4, xfs, btrfs, tmpfs) support these seeks in lseek():

  • SEEK_HOLE finds the next hole
  • SEEK_DATA finds the next allocated block

With cp --sparse=auto (the default), copying preserves holes:

bash
cp --sparse=auto big.img copy.img         # carries sparse over, if the FS can
cp --sparse=always big.img copy.img       # scans for zero regions and makes holes
cp --sparse=never big.img copy.img        # copies "dense", fills holes with zeros

The same applies to tar, rsync, dd:

bash
rsync --sparse                            # holes are preserved
tar --sparse -cf backup.tar big.img
dd conv=sparse if=src of=dst              # skip zero blocks

Without the right flags, a sparse 100GB file expands during the copy into an honest 100 GB.

Real uses in production

qcow2 for KVM

bash
qemu-img create -f qcow2 disk.qcow2 100G

qcow2 is a format with built-in sparse + COW + a chain of snapshots. On an ext4 host the qcow2 file is itself sparse too, so you save twice.

Loop device with a filesystem inside

bash
truncate -s 10G ext4.img
mkfs.ext4 ext4.img
sudo mount -o loop ext4.img /mnt/loop

The file starts at 0 bytes, mkfs lays out metadata (~1% of the size), and the rest is consumed as you write.

Backup with holes

bash
# Direct disk copy that skips zero blocks
dd if=/dev/sda of=backup.img conv=sparse status=progress
# Or with ddrescue
ddrescue /dev/sda backup.img backup.log

To restore, run dd if=backup.img of=/dev/sda without conv=sparse: then the holes reach the disk as real zeros.

When something goes wrong

  • ENOSPC on write into an "empty" hole: there was no physical space to materialize the block. Sparse saves space only while the holes are empty.
  • du shows huge numbers after a backup restore: you copied without --sparse=auto, so the holes filled with zeros and became real blocks.
  • tar extracted the sparse file "fat": pass --sparse both when creating the archive and when extracting. On GNU tar 1.30+, sparse extraction happens automatically if the archive was created with --sparse.
  • a VM disk grows on its own: the guest rewrites non-zero blocks to zeros, but qcow2 and the filesystem do not know those are zero. To fix it, inside the VM run fstrim periodically (for SSD-aware setups), or zerofree plus fallocate -d.
  • fallocate fails with ENOTSUP on NFS: not every NFS version supports punch_hole. NFSv4.2 does.
  • rsync expands holes into zeros: pass --sparse together with -S.

Checking that a file is sparse

bash
# Ratio of allocated to apparent
python3 -c "
import os
s = os.stat('big.img')
print(f'apparent: {s.st_size}, allocated: {s.st_blocks * 512}, ratio: {s.st_blocks * 512 / s.st_size if s.st_size else 0:.2%}')
"
# Map of allocated regions
filefrag -v big.img
xfs_io -c 'fiemap -v' big.img         # on any FS, kernel >= 2.6.36

§ команды

bash
truncate -s 10G big.img

Create a sparse file with a logical size of 10 GB, physically 0

bash
du -h --apparent-size big.img

Apparent size (what the app sees) vs plain du = allocated

bash
fallocate -d big.img

Find zero regions and turn them into holes, compaction

bash
cp --sparse=auto big.img copy.img

Copy a sparse file while preserving holes

bash
rsync --sparse src.img dst.img

rsync with hole detection, do not expand them into zeros

bash
qemu-img info disk.qcow2

Shows virtual size and actual size, a double sparse indicator

bash
filefrag -v file.img

Map of extents, 'holes' visible, to check the file's sparse structure

§ см. также

  • block-devicesBlock devices: disks in LinuxA block device is read and written in fixed-size blocks (usually 512B or 4K). Disks, SSDs, and NVMe drives are all block devices in `/dev/`.
  • inodeInodeAn inode is a filesystem record that holds metadata and pointers to a file's data blocks. The filename lives separately, in a directory, and simply points to the inode.
  • mount-and-fstabmount and /etc/fstab: attaching filesystems`mount` attaches a block device or filesystem to a mount point in the tree. `/etc/fstab` is the list of what to mount at boot.
  • filesystemsFilesystems: ext4, xfs, btrfs, zfsext4 is the reliable default. xfs handles large files and parallel I/O. btrfs and zfs give you snapshots, checksums, and built-in RAID, but they are more complex.
  • lvmLVM: Logical Volume ManagerLVM is a layer between [[block-devices]] and the filesystem: it pools disks and carves out logical volumes of any size that you can grow, snapshot, and migrate live.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies