ext4: the Linux filesystem workhorse: ext4

Why ext4 specifically

ext4 is the default on Debian, Ubuntu, Linux Mint, Arch-based distributions, and many others. It is well understood, the tooling matured over 15 years, and it has the largest body of recovery cases. Compared to its predecessors:

Version	What it added
ext2 (1993)	the base filesystem
ext3 (2001)	journaling
ext4 (2008)	extents, 1 EiB filesystem, 16 TiB files, online defrag, multi-block alloc

If the goal is "put any filesystem on it and forget about it", use ext4. If you have millions of small files or a single file in the terabyte range, look at xfs or btrfs.

Journaling

The main difference from ext2 is the journal. When metadata changes (inode, directories, bitmaps), the plan of changes is written to a circular journal first, then the changes themselves. A crash before the journal commit rolls back; a crash after it replays the journal and applies the changes.

Three modes, selected with mount -o data=...:

Mode	What is journaled	When
`data=writeback`	metadata only; data can hit the disk BEFORE or AFTER the metadata	maximum speed, with the risk that "a file points at someone else's blocks" after a crash
`data=ordered` (default)	metadata after the data is flushed	a compromise: metadata stays consistent with the data
`data=journal`	both metadata and data go through the journal	maximum safety, 2x slower

For a database it is sometimes worth using data=writeback, where the application WAL takes on crash-safety. For a container host, use the default.

The journal lives on the same filesystem in a special inode (8). You can move it to a separate fast disk:

bash

mke2fs -O journal_dev /dev/nvme0n1

mke2fs -t ext4 -J device=/dev/nvme0n1 /dev/sda1

Inode density is a fixed characteristic

At mkfs.ext4 time, the inode count is set to filesystem_size / bytes-per-inode. The default is 1 inode per 16 KiB. On a 1 TiB filesystem that is about 67M inodes by default.

What matters: you cannot add inodes after mkfs. df -i shows the usage. If you hit 100% inodes while gigabytes are still free, the only option is to recreate the filesystem.

For systems with millions of small files (a mail spool, a cache), raise the density:

bash

# 1 inode per 4 KiB - four times as many

mkfs.ext4 -i 4096 /dev/sdb1

# Or via a profile from /etc/mke2fs.conf

mkfs.ext4 -T news /dev/sdb1

For huge files (video, backups), lower it (-i 65536). You save space and speed up fsck.

Block size

The default is 4 KiB on x86. Sizes of 1, 2, and 4 KiB are supported. Do not change it without a reason:

4K is the optimum for the kernel page size
<4K spends more overhead
4K is not supported on x86 (on ARM/POWER you can use 16K, 64K)

noatime, relatime, lazytime

Under POSIX every read has to update atime (inode). That is a write on every read, which is lethal for performance.

Option	What it does
`atime` (default)	atime on every read
`relatime`	atime updates if the previous value is < mtime/ctime, or older than a day
`noatime`	never touch atime
`lazytime`	timestamps live in cache only, flushed to disk once a day

For production, use noatime or lazytime. Modern distributions set relatime by default.

fstab

UUID=...  /  ext4  defaults,noatime,lazytime,errors=remount-ro  0 1

Useful mkfs/tune2fs options

bash

# Creation

mkfs.ext4 -L data -m 1 -E lazy_itable_init=1,lazy_journal_init=1 /dev/sdb1

-L LABEL sets the label
-m N sets the reserve for root (default 5%, which is 500GB on a 10TB disk!)
-E lazy_itable_init=1 does not zero the inode table at creation (much faster on large disks; a background process zeroes it later)
-O ^has_journal means no journal (only if you know why, for example: an external journal_dev is already set, or the partition is temporary)
-T usage_type accepts news, largefile, largefile4

Tuning an existing filesystem:

bash

tune2fs -l /dev/sda1                  # filesystem parameters

tune2fs -m 1 /dev/sda1                # lower reserved to 1%

tune2fs -L data /dev/sda1             # change the label

tune2fs -O ^has_journal /dev/sda1     # disable the journal (dangerous)

tune2fs -c 0 -i 0 /dev/sda1           # disable mount-count and time-based fsck

Online resize and shrink

ext4 supports grow and shrink on an unmounted filesystem:

bash

# Grow (online or offline)

resize2fs /dev/sda1                   # to the full partition size

resize2fs /dev/sda1 100G              # to 100 GiB

# Shrink (offline only)

umount /dev/sda1

e2fsck -f /dev/sda1                   # a mandatory check

resize2fs /dev/sda1 50G

Unlike xfs, which can only grow, this is a plus for ext4.

fsck

Only on an unmounted filesystem:

bash

umount /mnt/data

e2fsck -f /dev/sda1                   # -f forces it even on a "clean" filesystem

e2fsck -y /dev/sda1                   # -y answers "yes to everything" (for scripts)

For the root partition there is errors=remount-ro in fstab. On a filesystem error it remounts the volume read-only automatically. More in fsck-and-recovery.

When something goes wrong

No space left with free GB: you ran out of inodes (df -i). Delete small files or recreate the filesystem with -i 4096.
Read-only file system: errors=remount-ro triggered. Check dmesg | grep EXT4-fs for the cause. Often a bad sector.
Files gone after a crash: data=writeback without an [[mount-and-fstab|fsync]] from the application. Lessons: fsync(), O_DSYNC for critical data.
Very slow after mkfs on a large disk: lazy_itable_init=1 is still working in the background. dmesg | grep ext4 shows it.
tune2fs: Filesystem has unsupported feature(s): an old distribution does not know the feature. Check dumpe2fs -h /dev/sdX | grep features and update e2fsprogs.
5% reserved bytes: on large disks use -m 1 or -m 0. The reserve is needed only on a root filesystem so that the system can keep running once it fills up.

Checking the state

bash

dumpe2fs -h /dev/sda1                 # the superblock without group details

debugfs -R 'stat <inode>' /dev/sda1   # details for a specific inode

filefrag -v /path/to/file             # fragmentation of a specific file

e4defrag /path                        # online defrag (rarely needed)

Why ext4 specifically

Version	What it added
ext2 (1993)	the base filesystem
ext3 (2001)	journaling
ext4 (2008)	extents, 1 EiB filesystem, 16 TiB files, online defrag, multi-block alloc

If the goal is "put any filesystem on it and forget about it", use ext4. If you have millions of small files or a single file in the terabyte range, look at xfs or btrfs.

Journaling

Three modes, selected with mount -o data=...:

Mode	What is journaled	When
`data=writeback`	metadata only; data can hit the disk BEFORE or AFTER the metadata	maximum speed, with the risk that "a file points at someone else's blocks" after a crash
`data=ordered` (default)	metadata after the data is flushed	a compromise: metadata stays consistent with the data
`data=journal`	both metadata and data go through the journal	maximum safety, 2x slower

For a database it is sometimes worth using data=writeback, where the application WAL takes on crash-safety. For a container host, use the default.

The journal lives on the same filesystem in a special inode (8). You can move it to a separate fast disk:

bash

mke2fs -O journal_dev /dev/nvme0n1

mke2fs -t ext4 -J device=/dev/nvme0n1 /dev/sda1

Inode density is a fixed characteristic

At mkfs.ext4 time, the inode count is set to filesystem_size / bytes-per-inode. The default is 1 inode per 16 KiB. On a 1 TiB filesystem that is about 67M inodes by default.

What matters: you cannot add inodes after mkfs. df -i shows the usage. If you hit 100% inodes while gigabytes are still free, the only option is to recreate the filesystem.

For systems with millions of small files (a mail spool, a cache), raise the density:

bash

# 1 inode per 4 KiB - four times as many

mkfs.ext4 -i 4096 /dev/sdb1

# Or via a profile from /etc/mke2fs.conf

mkfs.ext4 -T news /dev/sdb1

For huge files (video, backups), lower it (-i 65536). You save space and speed up fsck.

Block size

The default is 4 KiB on x86. Sizes of 1, 2, and 4 KiB are supported. Do not change it without a reason:

4K is the optimum for the kernel page size
<4K spends more overhead
4K is not supported on x86 (on ARM/POWER you can use 16K, 64K)

noatime, relatime, lazytime

Under POSIX every read has to update atime (inode). That is a write on every read, which is lethal for performance.

Option	What it does
`atime` (default)	atime on every read
`relatime`	atime updates if the previous value is < mtime/ctime, or older than a day
`noatime`	never touch atime
`lazytime`	timestamps live in cache only, flushed to disk once a day

For production, use noatime or lazytime. Modern distributions set relatime by default.

fstab

UUID=...  /  ext4  defaults,noatime,lazytime,errors=remount-ro  0 1

Useful mkfs/tune2fs options

bash

# Creation

mkfs.ext4 -L data -m 1 -E lazy_itable_init=1,lazy_journal_init=1 /dev/sdb1

-L LABEL sets the label
-m N sets the reserve for root (default 5%, which is 500GB on a 10TB disk!)
-E lazy_itable_init=1 does not zero the inode table at creation (much faster on large disks; a background process zeroes it later)
-O ^has_journal means no journal (only if you know why, for example: an external journal_dev is already set, or the partition is temporary)
-T usage_type accepts news, largefile, largefile4

Tuning an existing filesystem:

bash

tune2fs -l /dev/sda1                  # filesystem parameters

tune2fs -m 1 /dev/sda1                # lower reserved to 1%

tune2fs -L data /dev/sda1             # change the label

tune2fs -O ^has_journal /dev/sda1     # disable the journal (dangerous)

tune2fs -c 0 -i 0 /dev/sda1           # disable mount-count and time-based fsck

Online resize and shrink

ext4 supports grow and shrink on an unmounted filesystem:

bash

# Grow (online or offline)

resize2fs /dev/sda1                   # to the full partition size

resize2fs /dev/sda1 100G              # to 100 GiB

# Shrink (offline only)

umount /dev/sda1

e2fsck -f /dev/sda1                   # a mandatory check

resize2fs /dev/sda1 50G

Unlike xfs, which can only grow, this is a plus for ext4.

fsck

Only on an unmounted filesystem:

bash

umount /mnt/data

e2fsck -f /dev/sda1                   # -f forces it even on a "clean" filesystem

e2fsck -y /dev/sda1                   # -y answers "yes to everything" (for scripts)

For the root partition there is errors=remount-ro in fstab. On a filesystem error it remounts the volume read-only automatically. More in fsck-and-recovery.

When something goes wrong

No space left with free GB: you ran out of inodes (df -i). Delete small files or recreate the filesystem with -i 4096.
Read-only file system: errors=remount-ro triggered. Check dmesg | grep EXT4-fs for the cause. Often a bad sector.
Files gone after a crash: data=writeback without an [[mount-and-fstab|fsync]] from the application. Lessons: fsync(), O_DSYNC for critical data.
Very slow after mkfs on a large disk: lazy_itable_init=1 is still working in the background. dmesg | grep ext4 shows it.
tune2fs: Filesystem has unsupported feature(s): an old distribution does not know the feature. Check dumpe2fs -h /dev/sdX | grep features and update e2fsprogs.
5% reserved bytes: on large disks use -m 1 or -m 0. The reserve is needed only on a root filesystem so that the system can keep running once it fills up.

Checking the state

bash

dumpe2fs -h /dev/sda1                 # the superblock without group details

debugfs -R 'stat <inode>' /dev/sda1   # details for a specific inode

filefrag -v /path/to/file             # fragmentation of a specific file

e4defrag /path                        # online defrag (rarely needed)

ext4: the Linux filesystem workhorse

Why ext4 specifically

Journaling

Inode density is a fixed characteristic

Block size

noatime, relatime, lazytime

Useful mkfs/tune2fs options

Online resize and shrink

fsck

When something goes wrong

Checking the state

§ команды

§ см. также

ext4: the Linux filesystem workhorse

Why ext4 specifically

Journaling

Inode density is a fixed characteristic

Block size

noatime, relatime, lazytime

Useful mkfs/tune2fs options

Online resize and shrink

fsck

When something goes wrong

Checking the state

§ команды

§ см. также