081: Contract Divergence Resolution, SIGSEGV Delivery, and mremap

Context

After M9.8, the contract test suite reported: 100 PASS | 10 XFAIL | 10 DIVERGE | 1 FAIL

The 10 DIVERGEs and 1 FAIL broke the green suite. Investigation revealed three classes of issues: a real bug in fd-passing, two signal delivery bugs that prevented POSIX-compliant SIGSEGV handling, and a missing syscall (mremap) needed for musl's realloc. All four were fixed this session.

Final state: 104 PASS | 8 XFAIL | 6 DIVERGE | 0 FAIL

Fix 1: SCM_RIGHTS fd-passing (sockets.scm_rights_fdpass)

Root cause

recvmsg.rs only tried downcast_ref::<UnixSocket>() to find the inner UnixStream for ancillary data. But socketpair() stores bare Arc<UnixStream> objects in the fd table (not UnixSocket wrappers), so the downcast always failed, inner_stream was None, and the kernel silently dropped the SCM_RIGHTS cmsg — writing msg_controllen=0 back to userspace.

sendmsg.rs already did it correctly: try UnixStream first, then UnixSocket. The fix was to mirror that pattern in recvmsg.rs.

Fix

#![allow(unused)]
fn main() {
// Before: only tried UnixSocket
let inner_stream: Option<Arc<UnixStream>> =
    if let Some(sock) = (**file).as_any().downcast_ref::<UnixSocket>() {
        sock.connected_stream()
    } else {
        None
    };

// After: try UnixStream first (socketpair), then UnixSocket (socket+connect)
let owned_stream: Option<Arc<UnixStream>> =
    if let Some(sock) = (**file).as_any().downcast_ref::<UnixSocket>() {
        sock.connected_stream()
    } else {
        None
    };
let stream: &UnixStream =
    if let Some(s) = (**file).as_any().downcast_ref::<UnixStream>() {
        s
    } else if let Some(ref s) = owned_stream {
        s
    } else {
        return Ok(0);
    };
}

This is the same Arc<dyn FileLike> downcast pattern documented in the M4 critical bugs section — (**file).as_any() dispatches through the vtable to get the concrete type.

Fix 2: SIGSEGV delivery for page faults

Two bugs prevented POSIX-compliant SIGSEGV delivery. Both had the same symptom: processes that installed a SIGSEGV handler never had it called.

Bug A: Write fault on read-only page (vm.mprotect_roundtrip)

After mprotect(addr, len, PROT_READ) removes write permission, writing to the page triggers a page fault. The handler checked for Copy-on-Write:

#![allow(unused)]
fn main() {
let is_cow_write = reason.contains(PRESENT)
    && reason.contains(CAUSED_BY_WRITE)
    && (prot_flags & 2 != 0); // VMA has PROT_WRITE
}

Since the VMA no longer has PROT_WRITE, is_cow_write was false. The code fell through to update_page_flags(aligned_vaddr, prot_flags) — which re-applied the same PROT_READ flags. The CPU re-tried the write, faulted again, and looped forever. The test timed out at 30 seconds.

Fix: Before the fallthrough, detect permission violations and deliver SIGSEGV:

#![allow(unused)]
fn main() {
if reason.contains(CAUSED_BY_WRITE) && (prot_flags & 2 == 0) {
    drop(vm);
    drop(vm_ref);
    current.send_signal(SIGSEGV);
    return;
}
}

Bug B: Access to unmapped page (vm.munmap_partial)

After munmap() removes a page, accessing it triggers a page fault with no VMA. The handler called emit_crash_and_exit(SIGSEGV, ...) which unconditionally killed the process via Process::exit_by_signal() — bypassing any installed SIGSEGV handler.

Fix: Replace emit_crash_and_exit with send_signal(SIGSEGV) + return. The interrupt return path (x64_check_signal_on_irq_return) delivers the signal to the handler if one is installed. If no handler exists, the default SIGSEGV action terminates the process.

The same fix was applied to null-pointer faults and invalid-address faults.

Why this matters for apk

These two fixes are the only XFAIL items that were assessed as blockers for Alpine's apk. Without SIGSEGV delivery, any page fault in apk's code path (guard pages, mprotect'd regions, use-after-unmap) would either hang the process or kill it silently instead of allowing crash recovery.

Fix 3: mremap(2) implementation

Motivation

musl's realloc() calls mremap(MREMAP_MAYMOVE) to grow large allocations in-place (avoiding a malloc + memcpy + free round-trip). Without mremap, musl falls back to the slow path. For apk processing multi-megabyte APKINDEX files, this matters.

Implementation

New file: kernel/syscalls/mremap.rs (~180 lines). Supports:

  • Shrink: remove_vma_range() + unmap excess pages + TLB flush
  • Same size: no-op, return old address
  • Grow in-place: check if virtual space after VMA is free → extend_vma()
  • Grow with move (MREMAP_MAYMOVE): allocate new VA range, move page mappings from old to new, remove old VMA, single remote TLB flush

Key design decisions:

  • Only anonymous mappings for now (file-backed mremap deferred)
  • MREMAP_FIXED and MREMAP_DONTUNMAP return EINVAL (not needed for musl)
  • In-place grow extends the existing VMA (extend_vma()) rather than adding a new adjacent VMA — this is critical so that a subsequent shrink can find the single VMA covering the full range
  • Huge page handling: split 2MB pages before moving individual 4KB PTEs
  • Page refcounts are untouched during move (same physical page, new virtual address)

The contract test vm.mremap_grow validates: mmap 1 page → write sentinel → mremap grow to 2 pages → verify sentinel survived → verify new page is zero-filled → mremap shrink → verify sentinel again.

Wiring

  • x86_64: syscall 25, arm64: syscall 216
  • Vm::extend_vma(start, additional) added to kernel/mm/vm.rs

XFAIL audit for Alpine apk

Not everything was fixed — the remaining 6 DIVERGEs and 8 XFAILs were audited for whether they'd block Alpine's apk package manager:

IssueBlocks apk?Why
ASLR (2 tests)NoSecurity, not correctness
getrusage zerosNoapk doesn't check CPU time
uid=0 alwaysNoapk runs as root
SO_RCVBUF sizeNoPerformance only
setitimer precisionNoapk doesn't use timers
epoll oneshotNoapk is synchronous
sigaltstack stubNoSafety net only
mremap ENOSYSFixedNow implemented
SIGSEGV deliveryFixedNow implemented

apk.static runs on Kevlar

With the fixes in place, Alpine's apk.static (statically linked, musl) runs correctly:

$ apk.static --version
apk-tools 2.14.6, compiled for x86_64.

$ apk.static --help
usage: apk [<OPTIONS>...] COMMAND [<ARGUMENTS>...]
...
This apk has coffee making abilities.

Remaining blocker: ext2 + statx path resolution

The next blocker for apk --root /mnt is a VFS path resolution bug. When ext2 is mounted at /mnt/, C test binaries (compiled with older musl, using stat/fstatat) can access files: stat("/mnt/bin/busybox") succeeds. But BusyBox and apk.static (Alpine musl, likely using statx) cannot: test -f /mnt/bin/busybox returns "No such file or directory."

The ext2 mount itself works — the superblock is read, blocks and inodes are enumerated. The bug is specifically in cross-filesystem path traversal from initramfs (tmpfs) into ext2 when using the statx syscall path. This is the next debugging target.

Test results

SuiteBeforeAfter
Contracts100 PASS / 10 XFAIL / 10 DIVERGE / 1 FAIL104 PASS / 8 XFAIL / 6 DIVERGE / 0 FAIL
Busybox101/101101/101
systemd-v325/2525/25

Files changed

FileChange
kernel/syscalls/recvmsg.rsUnixStream downcast before UnixSocket
kernel/mm/page_fault.rsSIGSEGV delivery via send_signal (3 sites)
kernel/syscalls/mremap.rsNew: mremap(2) implementation
kernel/mm/vm.rsNew: extend_vma() method
kernel/syscalls/mod.rsDispatch + constants for SYS_MREMAP
testing/contracts/vm/mremap_grow.cNew contract test
testing/contracts/known-divergences.json+5 XFAIL, -4 stale entries
testing/test_apk_update.shRewritten for apk.static --root (no chroot)
tools/build-initramfs.pyFix resolv.conf to use QEMU DNS (10.0.2.3)
MakefileUpdated run-alpine, test-alpine targets