DateisystemeSite

XFS is a 64-bit journaling file system originally developed by Silicon Graphics (SGI) for IRIX. His focus is not on “features” such as snapshots or compression (although there are approaches to this), but on extreme scalability and parallelization.

While ext4 has historically often been limited by locks when many CPU cores wanted to write at the same time, XFS was designed to push hardware limits. It assumes that you have a lot of cores and a lot of plates.

Core Mechanics: The “Why” Behind the Technology

Allocation Groups (AGs) – The Parallelism Engine

The most important architectural feature of XFS is the division of the data carrier into Allocation Groups (AGs).

Think of ext4 as a supermarket with only one checkout. If 10 want to write processes, they have to queue (locking).

XFS divides the partition into 4, 8, or more independent regions (AGs), for example. Each AG manages its own inodes and free space.

Causality:

If you write data in parallel (e.g. a database or a mail server), the kernel assigns the writes to different AGs.

  • Result: Process A writes in AG 1, Process B in AG 2. They do not block each other. This makes XFS the king of I/O throughput on multi-core servers.

B+ Trees (Everything is indexed)

Most file systems use lists or simple trees to manage free space. XFS uses B+ trees for almost everything:

  • Free space tracking.
  • Directory Indexes.
  • Dynamic inode allocation.

This means that the performance of finding free space or files does not decrease linearly when the file system becomes full or contains millions of files. The access time remains extremely stable (O(log n)).

Reflink & CoW (The “Veeam Feature”)

Although XFS is not a classic copy-on-write file system like ZFS, it has been capable of reflinks (reflink=1) for a few years now.

This allows files to be “copied” by simply placing new pointers on the same data blocks (similar to deduplication).

Practical impact (backup):

Backup software such as Veeam uses this for “Fast Clone”. A synthetic full backup (copying from old backups to a new full) happens almost instantaneously, without any physical data being moved. This saves hours of I/O time.

The Great Architectural Trap (Shrinking)

That’s the most important thing you need to know as an architect: XFS can’t be scaled down.

Unlike ext4, where you resize2fs can use it to shrink a volume, XFS doesn’t do that.

  • The reason: The allocation groups are permanently distributed on the disk when they are created (mkfs). To reduce the size of the file system, you would have to move data from the rear AGs to the front ones and recalculate the entire geometry. This is so complex and risky that the developers never implemented it.
  • Consequence: If you create a 10 TB LUN and realize that you only need 5 TB, you have to back up the data, delete the volume, create a new one and restore it. So plan your LVM sizes conservatively (growing is always possible: xfs_growfs).

Metadata Journaling

XFS only writes metadata (file names, sizes, attributes) to its journal, not the payload itself.

  • Advantage: Extreme speed.
  • Risk: In the event of a power failure, the file structure is guaranteed to be intact (no fsck needed), but the contents of a file that has just been written could contain zeros (“stale data”).
  • Solution: Modern applications (databases) know this and use fsync()it to ensure that data really ends up on the disk.

Tuning for architects

External Log(logdev)

For database servers (PostgreSQL/MySQL), latency when writing to the journal (Write Ahead Log) is crucial.

You can put the XFS journal on a separate device.

  • Setup: A small, extremely fast NVMe for the journal + a large RAID of HDDs for the data.
  • Command: mkfs.xfs -l logdev=/dev/nvme0n1 /dev/sdb1
  • Effect: Writes are not slowed down by the seek time of the HDDs, because the commit ends up in the fast journal.

Inode64

XFS used to put all inodes in the first TB of the disk (for compatibility). On huge storages, this led to inodes being far away from their data (seek time!).

Today, inode64 is the standard. Make sure it’s active in the mount options so that inodes are allowed to be close to their data anywhere on the disk.

Conclusion & Differentiation

Featureext4XFSZFS
Goal: General Purpose,Performance & Scale,Integrity & Features
Max. file size16 TB8 exabytes16 exabytes
ShrinkYesNoNo (Pool Level)
Parallel I/O Medium(Locking)Excellent (AGs)Good (Transaction Groups)
CheckusmmenOnly JournalMetadata (CRC)Everything (Data + Meta)

When to use XFS:

  1. You operate large databases (low fragmentation, direct I/O support).
  2. You are using RHEL/CentOS/AlmaLinux (Native Support).
  3. You have huge amounts of data (>50 TB) or huge individual files.
  4. You use Veeam repositories (Reflink Support).

When not:

  1. On small boot partitions or USB sticks (overhead too large).
  2. If you need to be able to reduce storage space flexibly (virtualization/LVM dynamic).

This post is also available in: Deutsch English