linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Simulator
  • Knowledge base
  • Interview prep
Index
Categories
All entries
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
home/linux/kb/File system/xfs

kb/filesystem ── File system ── intermediate

XFS: extents and parallel I/O

XFS is the RHEL 7+ default: allocation groups (parallel I/O), extent-based allocation, online grow. **It cannot shrink**, grow only. Ideal for big files, databases, and parallel workloads.

view as markdownaka: xfs-fs, xfs-extents, xfs-allocgroups

Why XFS

XFS was born at SGI in 1993 for IRIX and large storage. The main differences from ext4:

  • Allocation groups: the filesystem is split into N independent regions, and different processes write to different AGs in parallel
  • Extent-based allocation: a fragment is described by a single extent (start, length) rather than a list of blocks
  • Dynamic inode allocation: inodes are not fixed at mkfs time
  • Delayed allocation: blocks are allocated at flush, not at write
  • Journals metadata only, and does it aggressively
  • Online grow is available, shrink is not

From RHEL 7 (2014) it is the default. It fits databases (PostgreSQL, MySQL), file servers, kvm/qemu disks, and mail spools with large mboxes.

Allocation groups: the key to parallelism

The filesystem is divided into N equal regions (by default 4 per CPU at mkfs time). Each AG is almost an independent mini-filesystem: its own free blocks, its own inodes. This lets you:

  • Have several processes write to different AGs at once without locking
  • Keep allocation decisions local (nearby inodes lead to nearby blocks)
  • Run a fast mkfs (you format the AGs in parallel)
bash
$ mkfs.xfs /dev/sdb1
meta-data=/dev/sdb1   isize=512   agcount=4, agsize=...

You can override it:

bash
mkfs.xfs -d agcount=16 /dev/sdb1

More AGs means more parallelism, but also more overhead. The default is usually fine.

Inodes are dynamic

Unlike ext4 (fixed at mkfs time), XFS allocates inodes as needed. You will never hit "no space, but free GB".

The downsides: inodes are scattered across the filesystem, so a directory walk is slower on an HDD. On SSD this does not matter.

The inode size is 512B by default (it used to be 256B). A large inode is good for inline extended-attributes and ACLs without allocating separate blocks.

bash
mkfs.xfs -i size=1024 /dev/sdb1   # even more room for xattr

Journaling

XFS journals metadata only. Data is protected by delayed allocation plus barriers. The journal lives in a built-in zone (you can move it out):

bash
mkfs.xfs -l logdev=/dev/nvme0n1p1,size=128m /dev/sdb1
mount -o logdev=/dev/nvme0n1p1 /dev/sdb1 /mnt/data

Putting the log on NVMe usually speeds up a heavy metadata load (creating and deleting millions of files).

Mount options

fstab
UUID=...  /data  xfs  defaults,noatime,nodiratime,inode64,logbsize=256k  0 0
OptionWhat it does
noatimeas everywhere, turns off atime
inode64 (default since RHEL7)inodes can live in any AG; without it they sit only in the lowest 1 TiB
logbsize=256klog buffer size; larger is faster on metadata-heavy loads
largeioreports the optimal I/O size in st_blksize as a hint to applications
nobarrierdisables barriers, faster but dangerous on non-battery RAID
pquota, uquota, gquotaquotas by project/user/group

Barriers (barrier=1) guarantee that the journal hits the disk before the data. On a HW RAID with a battery-backed cache you can run without them (set nobarrier), but then you own the correctness of the storage stack.

Online grow

bash
# First resize the partition or LV
parted /dev/sdb resizepart 1 100%
# or
lvextend -L+100G /dev/vg/lv-data
# Then XFS picks it up
xfs_growfs /mnt/data

xfs_growfs takes a mountpoint, not a device. The filesystem must be mounted.

Shrink is impossible. If you need to make it smaller: back up, mkfs, restore. This is the sorest spot of XFS.

xfs_info, xfs_db

bash
xfs_info /mnt/data       # structure: AG count, block size, log
xfs_db -r /dev/sdb1 -c "version"   # internal details (read-only is safe)
xfs_io -c "stat" /path/to/file     # parent inode, extents
xfs_io -c "fiemap" file            # extent map
xfs_bmap -v file                   # alternative to fiemap

xfs_db is a low-level debugger; in RW mode you can break the filesystem by accident.

xfs_repair vs e2fsck

XFS runs a journal replay automatically at mount. If that is not enough (corruption after a controller crash, bad sectors):

bash
umount /mnt/data
xfs_repair /dev/sdb1                  # must be unmounted
xfs_repair -L /dev/sdb1               # force, zeroing the journal (RISK!)

-L is the last resort: it zeroes the log, losing whatever did not finish committing. Use it only when a normal repair fails with "log is corrupt".

Unlike [[ext4|e2fsck]], xfs_repair does not patch small things on a healthy filesystem. You either need it or you do not.

Quota

XFS quota has three dimensions:

  • uquota: per user
  • gquota: per group
  • pquota: per project (a group of inodes tagged with one id)

pquota is unique to XFS: you can put a quota on an arbitrary subtree of directories that is not tied to a user or group:

bash
mount -o pquota /dev/sdb1 /data
echo '42:/data/projects/foo' >> /etc/projects
echo 'foo:42' >> /etc/projid
xfs_quota -x -c 'project -s foo' /data
xfs_quota -x -c 'limit -p bhard=10g foo' /data

XFS vs ext4

TraitXFSext4
Inodesdynamicfixed at mkfs
Parallel I/Ostrong (AG)weaker
Huge filesystems (>16 TiB)goodworse due to structure
Small filesgoodgood (denser packing)
Online resizegrow onlygrow + shrink
Crash recoveryjournal replayjournal replay + extensive fsck
RAM for metadatamoreless
Default onRHEL 7+, CentOS, Oracle LinuxDebian/Ubuntu/Mint

For a root filesystem on a host with no special requirements, both are fine. For a data partition with databases, VMs, or parallel load, pick XFS.

When something goes wrong

  • xfs_growfs: data size unchanged: the partition or LV is not resized yet. Run parted resizepart or lvextend first, then xfs_growfs.
  • Structure needs cleaning at mount after a crash: run xfs_repair. If it complains about the log, use -L (knowing the risk).
  • Fragmentation: XFS usually does not fragment on its own, but it starts to on a near-full filesystem. xfs_db -c frag /dev/sdb1 (read-only only). xfs_fsr is the online defrag.
  • mkfs on NVMe balks about block size 4K: old xfsprogs could not handle it; update xfsprogs >= 5.0.
  • Quota does not count a project: you forgot mount -o pquota, or you mixed it up with prjquota (the older name).
  • Cannot allocate memory at mount after a crash: a very large journal; you need more RAM or you have to move the log to a separate disk.

§ команды

bash
sudo mkfs.xfs -L data -f /dev/sdb1

Create XFS with a label; -f is needed if the partition already holds something

bash
sudo xfs_info /mnt/data

Parameters of an existing filesystem: AG count, block size, log location

bash
sudo xfs_growfs /mnt/data

Grow XFS to the full size of the partition or LV; takes a mountpoint

bash
sudo mount -o noatime,nodiratime,inode64 /dev/sdb1 /mnt/data

Mount with a typical set of tunables for prod

bash
sudo xfs_repair -n /dev/sdb1

Dry-run check; shows problems without fixing them (-n)

bash
sudo xfs_io -c 'fiemap' /path/to/largefile

Extent map; use it to see the fragmentation of a large file

bash
sudo xfs_quota -x -c 'report -h' /data

Quota report; who uses how much

§ см. также

  • filesystemsFilesystems: ext4, xfs, btrfs, zfsext4 is the reliable default. xfs handles large files and parallel I/O. btrfs and zfs give you snapshots, checksums, and built-in RAID, but they are more complex.
  • ext4ext4: the Linux filesystem workhorseext4 is the default filesystem on most distributions: journaling, extents, a fixed inode count set at mkfs time. The main tunes are the data mode, noatime, and lazy init. Stable for 15+ years. Does not scale like XFS.
  • btrfsbtrfs: copy-on-write, subvolumes, and snapshotsbtrfs is a copy-on-write filesystem with subvolumes, O(1) snapshots, native RAID 0/1/10, and data checksums. RAID 5/6 is problematic. COW fragmentation hurts databases and VM images, so turn it off for them.
  • mount-and-fstabmount and /etc/fstab: attaching filesystems`mount` attaches a block device or filesystem to a mount point in the tree. `/etc/fstab` is the list of what to mount at boot.
  • fsck-and-recoveryfsck and recovery: checking and repairing a filesystemfsck, a check of an unmounted filesystem. e2fsck (ext), xfs_repair (XFS), btrfs check (btrfs). Journal replay at mount handles 90% of problems after a crash.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies