M10 Phase 7b: Crash Diagnostics + sync Stub

Debugging the apk.static SIGSEGV took hours of manual grep, objdump, and re-running QEMU with different debug= flags. The kernel had the data — fault address, instruction pointer, memory map — but only printed a one-line warning. All the rich context was lost by the time the process exited.

Per-process syscall trace

Every process now records its last 32 syscalls in a lock-free ring buffer. The buffer uses AtomicCell entries and an AtomicU32 write index — one relaxed fetch_add plus one atomic store per syscall, ~5ns overhead. Recording is unconditional for all processes, not just PID 1.

#![allow(unused)]
fn main() {
pub struct SyscallTrace {
    entries: [AtomicCell<SyscallTraceEntry>; PROC_TRACE_LEN],
    write_idx: AtomicU32,
}
}

On crash, dump_trace() returns the entries in chronological order. This replaced the global PID-1-only trace buffer for crash diagnostics.

CrashReport debug event

When a process dies by fatal signal, the kernel now emits a structured CrashReport JSONL event containing:

  • PID, signal name, command line
  • Fault address and instruction pointer
  • FS base (TLS pointer)
  • Last 32 syscalls with resolved names
  • Up to 64 VMAs from the process memory map

The event is emitted from three places: the null-pointer, invalid-address, and no-VMA paths in the page fault handler, plus the general exit_by_signal catch-all. The VMA collection uses is_locked() to avoid deadlock if the crash was caused by a VM lock issue.

DBG {"type":"crash_report","pid":22,"signal":11,"signal_name":"SIGSEGV",
     "cmdline":"apk.static --root /mnt info","fault_addr":0x0,"ip":0x420000,
     "fsbase":0x88f5f8,"regs":{...},
     "syscalls":[{"nr":257,"name":"openat","result":6,"a0":0x3,"a1":0x742266},
                 {"nr":9,"name":"mmap","result":2465792,"a0":0x0,"a1":0x2004c}],
     "vmas":[{"start":0x400000,"end":0x89328c,"type":"file"},
             {"start":0x9fffdf000,"end":0xa00000000,"type":"anon"},...]}

crash-report.py

A Python tool that parses QEMU serial output and generates human-readable crash reports:

========================================================================
  CRASH REPORT: PID 22 (apk.static --root /mnt info) killed by SIGSEGV
========================================================================

  Fault address: 0x0
  Instruction:   0x420000
  FS base:       0x88f5f8

  Disassembly around 0x420000:
      41ffe4:  64 48 8b 04 25 28 00  mov    %fs:0x28,%rax
  >>> 420000:  48 85 ff              test   %rdi,%rdi

  Last 32 syscalls (oldest first):
    [30] openat(0x3, 0x742266) -> 6
    [31] mmap(0x0, 0x2004c) -> 0x25a000

  Memory map (39 VMAs):
    0x000000400000-0x00000089328c (   4M) file
    0x0009fffdf000-0x000a00000000 ( 132K) anon
    ...

Auto-disassembly via objdump, symbol resolution via addr2line/nm, and --json mode for automation.

Usage:

python3 tools/run-qemu.py --disk build/alpine-disk.img \
  --append-cmdline "debug=fault,process" kevlar.x64.elf 2>&1 \
  | python3 tools/crash-report.py --binary /tmp/apk.static

SIGSEGV always-on logging

All four SIGSEGV paths in the page fault handler now use warn! instead of debug_warn!. Fatal signal delivery is rare — always worth logging. Each path prints the fault address, PID, instruction pointer, and reason:

SIGSEGV: null pointer access (pid=22, ip=0x420000, fsbase=0x88f5f8)
SIGSEGV: no VMA for address 0xdeadbeef (pid=5, ip=0x401234, reason=CAUSED_BY_WRITE)

sync(2) stub

poweroff -f calls sync() before issuing reboot(2). Syscall 162 on x86_64 (81 on arm64) was unimplemented, producing a harmless but confusing warning on every shutdown. Now returns 0 — correct since ext2 writes are synchronous (no write-back cache).

QEMU exit hint

run-qemu.py now prints Press Ctrl-A X to exit QEMU on interactive sessions. With -serial mon:stdio, Ctrl-C is captured as serial input to the guest. The QEMU escape sequence is Ctrl-A then X.

Per-CPU register stash

The interrupt handler now stashes all GP registers + RIP + RSP + RFLAGS to a per-CPU static array before dispatching the page fault handler. This costs ~10ns per page fault (19 relaxed atomic stores) — negligible on 2900ns demand-page faults. The crash report reads the stash and includes real register values.

chroot(2) + sync(2)

chroot(2) implemented: changes the process's root directory via RootFs::chroot(). This enables chroot /mnt /sbin/apk info which successfully lists Alpine packages from the ext2 rootfs.

sync(2) stubbed (returns 0) — our ext2 writes are synchronous, so sync is a no-op. Eliminates the "unimplemented syscall 162" warning on poweroff -f.

QEMU exit hint

run-qemu.py prints "Press Ctrl-A X to exit QEMU" on interactive sessions (TTY-only). With -serial mon:stdio, Ctrl-C becomes serial input — the QEMU escape is Ctrl-A then X.

Files changed

FileChange
kernel/process/process.rsPer-process SyscallTrace ring buffer, CrashReport emission in exit_by_signal
kernel/debug/event.rsCrashReport variant + JSONL serialization
kernel/mm/page_fault.rsemit_crash_and_exit helper, always-on SIGSEGV logging
kernel/syscalls/mod.rsUnconditional per-process trace recording, sync(2) stub
tools/crash-report.pyNew: crash report parser with auto-disassembly
kernel/syscalls/chroot.rsNew: chroot(2) syscall
kernel/fs/mount.rsRootFs::chroot() method
kernel/syscalls/mod.rssync(2) stub, chroot dispatch
platform/crash_regs.rsNew: per-CPU register stash
platform/x64/interrupt.rsStash registers before page fault dispatch
tools/run-qemu.pyCtrl-A X exit hint for interactive sessions