M5 Phase 6: Read-Only ext2 Filesystem

Kevlar can now mount and read an ext2 filesystem from a VirtIO block device. Files, directories, and symbolic links all work. All 31 syscall correctness tests pass. Persistent storage is live.

Why ext2?

ext2 is the ideal first real filesystem for a new OS:

  • The on-disk format is completely documented
  • No journaling complexity — ext2 is a simple struct-on-disk design
  • Linux and macOS can create ext2 images trivially (mkfs.ext2, fuse-ext2)
  • It's the ancestor of ext3/ext4, so understanding it builds toward both

We only need read-only access for now — the goal is to pass programs and data into the kernel, not to write logs. EROFS is returned for all write operations.

On-Disk Format

An ext2 volume is divided into fixed-size blocks (1024, 2048, or 4096 bytes). Blocks are grouped into block groups, each described by a group descriptor.

Offset 0       : (512 bytes, unused on 1024-byte block disks)
Offset 1024    : Superblock (1024 bytes)
Offset 2048    : Block Group Descriptor Table
Offset N*block : Block group 0: inode bitmap, block bitmap, inode table, data
...

The superblock contains everything we need to bootstrap: total block count, blocks per group, inodes per group, block size, and (at offset 56) the magic number 0xEF53.

Every file and directory is represented by an inode. The root directory is always inode 2. Given an inode number, we can find it by:

group        = (ino - 1) / inodes_per_group
index        = (ino - 1) % inodes_per_group
byte_offset  = index * inode_size
block        = group_desc[group].inode_table + byte_offset / block_size

Each inode holds 15 block pointers:

block[0..11]  : direct block pointers
block[12]     : single-indirect (points to a block of pointers)
block[13]     : double-indirect (pointer → pointer block → data)
block[14]     : triple-indirect (not implemented — not needed for small disks)

Directory entries are stored in the inode's data blocks as a linked list of variable-length records:

struct ext2_dir_entry_2 {
    uint32_t inode;      // inode number (0 = deleted)
    uint16_t rec_len;    // length of this entry (advance by this to get next)
    uint8_t  name_len;
    uint8_t  file_type;  // 1=file, 2=dir, 7=symlink, ...
    char     name[name_len];
};

Symbolic links short enough to fit in 60 bytes (the space occupied by block[0..14]) are stored inline — no data block needed. Longer symlinks use the normal block pointer machinery.

Ringkernel Architecture

kevlar_ext2 is a Ring 2 service crate:

# services/kevlar_ext2/Cargo.toml
[dependencies]
kevlar_api  = { path = "../../libs/kevlar_api" }   # BlockDevice trait
kevlar_vfs  = { path = "../../libs/kevlar_vfs" }   # VFS traits
kevlar_utils = { path = "../../libs/kevlar_utils", features = ["no_std"] }

The crate is #![no_std] and #![forbid(unsafe_code)]. It never touches raw pointers or calls into the kernel directly — it only reads from a BlockDevice and implements FileSystem, Directory, FileLike, and Symlink from kevlar_vfs.

The kernel side is three lines in mount.rs:

#![allow(unused)]
fn main() {
"ext2" => {
    kevlar_ext2::mount_ext2()?
}
}

mount_ext2() grabs the global BlockDevice (registered by the VirtIO block driver during PCI probe) and calls Ext2Filesystem::mount().

Implementation Highlights

Block-Level I/O

All reads go through read_block(block_num), which multiplies by block_size / 512 to get the sector number and calls device.read_sectors(). The block cache in the VirtIO driver (256 entries, direct-mapped on sector number) absorbs the repeated reads to directory and indirect blocks.

The root_dir() Workaround

The VFS FileSystem trait exposes root_dir(&self), but Ext2Dir needs an Arc<Ext2Filesystem> to call methods on the filesystem. With only &self available, we reconstruct an Arc by cloning all the cheap fields:

#![allow(unused)]
fn main() {
impl FileSystem for Ext2Filesystem {
    fn root_dir(&self) -> Result<Arc<dyn Directory>> {
        let inode = self.read_inode(EXT2_ROOT_INO)?;
        Ok(Arc::new(Ext2Dir {
            fs: Arc::new(Ext2Filesystem {
                device: self.device.clone(),      // Arc clone — cheap
                superblock: self.superblock.clone(),
                block_size: self.block_size,
                groups: self.groups.clone(),      // small Vec
                inodes_per_group: self.inodes_per_group,
                inode_size: self.inode_size,
            }),
            inode_num: EXT2_ROOT_INO,
            inode,
        }))
    }
}
}

The device Arc clone is zero-cost. The groups Vec is small (one entry per block group — a 16 MiB disk has only one group). This is called once per mount, so the cost is negligible.

A cleaner long-term fix is to store the Arc<Ext2Filesystem> inside the struct itself (a self-referential pattern) — but that requires Arc::new_cyclic and is not worth the complexity right now.

Tests

Seven new tests exercise every layer of the filesystem:

TestWhat it checks
ext2_mountmount("none", "/tmp/mnt", "ext2", ...) returns 0
ext2_read_fileRead /tmp/mnt/greeting.txt, verify content
ext2_listdirgetdents on mount root, find expected filenames
ext2_subdirRead /tmp/mnt/subdir/nested.txt
ext2_symlinkOpen /tmp/mnt/link.txt (symlink → greeting.txt), read content
ext2_statstat on a file, verify size and mode bits
ext2_readonlyopen(..., O_WRONLY) returns EROFS

Run them with:

make test-ext2

This creates build/disk.img if it doesn't exist, boots Kevlar with --disk build/disk.img, and checks all 31 tests pass.

The disk image is pre-populated by:

sudo mount -o loop build/disk.img /mnt
sudo sh -c 'echo "hello from ext2" > /mnt/greeting.txt'
sudo mkdir /mnt/subdir
sudo sh -c 'echo "nested file" > /mnt/subdir/nested.txt'
sudo ln -s greeting.txt /mnt/link.txt
sudo umount /mnt

Results

PASS ext2_mount
PASS ext2_read_file
PASS ext2_listdir
PASS ext2_subdir
PASS ext2_symlink
PASS ext2_stat
PASS ext2_readonly
TEST_END 31/31
ALL TESTS PASSED

What's Next

With a working read-only ext2, we can:

  • Load userspace programs from a persistent disk at boot (replacing initramfs for larger binaries like Wine)
  • Add write support (ext2 write is straightforward — no journaling)
  • Mount multiple filesystems at different mount points

The immediate next milestone is write support and a writable root filesystem.