ext4 (Fourth Extended Filesystem) • PhinIT.DE

ext4 is the evolutionary successor to ext3 and is considered the “standard driver” in the Linux kernel. Unlike ZFS or Btrfs, ext4 is a pure file system, not a volume manager.

It manages files on a given block device (partition, LVM volume, RAID array). It doesn’t care where the data physically resides or if a hard drive fails. It leaves this abstraction to the layers below (LVM/MDRAID).

Design goal: Maximum stability, backward compatibility, and performance for classic workloads, without the overhead of complex copy-on-write (CoW) mechanisms.

Core Mechanics: The “Why” Behind the Technology

Journaling (JBD2)

To prevent data corruption in the event of power outages, ext4 uses a “journal” (write-ahead log). Before metadata (and optionally user data) are finally written to the disk, it ends up in the journal.

After a crash, the system does not have to scan the entire disk (fsck), but only plays the journal again (replay).

Architecture Modes (Mount Options):

data=ordered (Standard): Metadata is added to the journal. The real data is written to the disk before the metadata is marked as “committed” in the journal.
- Causality: Prevents files that contain garbage from existing. A good compromise between safety and speed.
data=writeback: Only metadata is journaled. Data can be written after the metadata.
- Risk: If you crash, you may have a file that is the correct size but contains old data remnants (garbage).
data=journal: Everything (data + metadata) is written twice (first journal, then disk). Extremely secure, but halves the writing performance.

extents (instead of block mapping)

ext3 used indirect block addressing (lists of individual blocks). This was inefficient for large files, as the metadata overhead became huge and the CPU had to do a lot of calculation.

ext4 uses Extents.

Instead of saying: “File A is on block 1, 2, 3, 4, 5…”, ext4 says: “File A starts at block 1 and is 1000 blocks long”.

Implication:

Large files are managed with almost no metadata overhead.
Fragmentation is reduced as the system attempts to reserve contiguous blocks.

Delayed Allocation (Delalloc)

ext4 doesn’t write data immediately when the command comes. It keeps the data in RAM (page cache) and waits as long as possible (up to 60 seconds or until the buffer is full) before allocating physical blocks.

The advantage:

Curious files (which are created and deleted immediately) never touch the disk. For long-lived files, now that the allocator knows the total file size, it can calculate a much better placement on the disk and avoid fragmentation.

Limitations & Architecture Disadvantages (vs. ZFS)

This shows the age of the design. If you’re using ext4, you need to know what it can’t do:

No data checksums: ext4 checks metadata (journal checksums), but not the file contents.
- Scenario: If a bit on the HDD tips (bit red), ext4 delivers the corrupt file to the application without warning. There is no “self-healing”.
No native snapshots: ext4 doesn’t know snapshots. You’ll need to put LVM (Logical Volume Manager) underneath to take advantage of snapshots. However, these are slower than ZFS snapshots, because LVM has to copy blocks when changes are made (classic CoW at block level), while ZFS only bends pointers.
Inode limitation:Formatting (mkfs.ext4) creates a fixed number of inodes (file entries).
- Pain Point: If you have millions of tiny files (e.g., mail servers, cache directories), the file system may be “full” (no more inodes) even though there are still terabytes of free space. This cannot be changed afterwards (only by reformatting).

Unique Perks (When ext4 Wins)

Why do we still use it?

Shrinking: Unlike XFS (can only grow) and ZFS (VDEVs cannot shrink), an unmounted ext4 filesystem can be shrunk (resize2fs). This is essential for VM templates or repartitioning during operation.
Robustness: When ZFS crashes, the pool is often gone. When ext4 crashes, tools like e2fsck almost always recover at least some of the data after lost+found. It is extremely forgivable in case of hardware failures.
Overhead:ext4 requires hardly any RAM and CPU. It runs just as well on a Raspberry Pi as it does on an enterprise server.

Tuning for System Architects

If you deploy ext4 on servers, use these adjusting screws:

Reserved Blocks (`-m`)

By default, ext4 reserves 5% of the space for the rootuser and system services so that the system does not crash when the disk is full.

Problem: With a 10 TB disk, that’s 500 GB of wasted space.
Fix: Set it to 1% or 0% for data-only disks: tune2fs -m 0 /dev/sdX1

Dir_index (HTree)

Ensures that directories are not stored as a linear list, but as a hash tree (H-tree).

Causality: Massively accelerates file access in folders with thousands of files. Is usually standard today (tune2fs -l /dev/sdX | grep dir_index).

Mount Options

noatime (or relatime): Disables the writing of the access timestamp on each read. Saves massive IOPS.
discard: Enables TRIM for SSDs directly upon deletion.
- Best Practice: It’s better not to use the mount option, but a systemd timer (fstrim.timer) that does this weekly. Live discard can slow down performance under heavy load.

Result

ext4 is the Toyota Hilux of file systems. It lacks the modern features (compression, deduplication, bit-red protection), but it always gets you there.

Recommended use:

System Partitions (Root / Boot): Always ext4. Simplicity wins here.
VM images (as guest FS): ext4, because there is little overhead.
Data Grab (NAS): Here, ext4 loses to ZFS because of the lack of data integrity.

This post is also available in: Deutsch English