Blog 117: OpenRC INVALID_OPCODE — signal delivery fix and crash investigation
Date: 2026-03-24 Milestone: M10 Alpine Linux
Summary
Fixed the kernel's user fault signal delivery (all exceptions sent SIGSEGV;
now correctly sends SIGILL, SIGFPE, etc.) and investigated a deterministic
INVALID_OPCODE crash in OpenRC's "Caching service dependencies" phase.
The crash is caused by the CPU executing from the middle of a valid mov
instruction in musl's timezone code — a 2-byte RIP misalignment that points
to a signal return or page fault return bug.
Also fixed: UDP getsockname (c-ares DNS), certificate verification tests
targeting google.com (Alpine CA bundle coverage), and the test-openssl
Makefile target timeout.
Bug Fix: User fault signal types
All user-mode CPU exceptions were unconditionally mapped to SIGSEGV and
killed the process immediately via exit_by_signal(). This meant:
- Programs couldn't install SIGILL handlers (e.g., for CPU feature probing)
- SIGFPE handlers for divide-by-zero never fired
- The signal number in
waitpidstatus was wrong (11 instead of 4/8)
Fix (kernel/main.rs): Map exception vectors to POSIX signals:
| Exception | Signal |
|---|---|
| INVALID_OPCODE | SIGILL (4) |
| DIVIDE_ERROR | SIGFPE (8) |
| X87_FPU, SIMD_FLOATING_POINT | SIGFPE (8) |
| GPF, stack/segment faults | SIGSEGV (11) |
Changed exit_by_signal(SIGSEGV) to send_signal(correct_signal) — the
signal is now delivered through the normal path, allowing user handlers to
catch faults. If no handler is installed (SIG_DFL = terminate), the process
dies on interrupt return via x64_check_signal_on_irq_return.
OpenRC Crash Investigation
The symptom
OpenRC boots, creates /run/openrc and /run/lock, starts "Caching service
dependencies", then crashes:
USER FAULT: INVALID_OPCODE pid=7 ip=0xa000411f1 signal=4 cmd=/sbin/openrc sysinit
Identifying the crash location
-
Interpreter base: Added PID-tagged logging to
execve()→ OpenRC's ld-musl loads at0xa0000b000 -
Offset:
0xa000411f1 - 0xa0000b000 = 0x361f1in ld-musl -
Function:
sem_close+0xf71— actually musl's timezone/localtime implementation (objdump mis-labels due to stripped symbols) -
Instruction: The crash is 2 bytes INTO a valid 6-byte instruction:
361ef: 8b 05 37 ee 06 00 mov 0x6ee37(%rip),%eax 361f5: f7 d8 neg %eaxAt IP
0x361f1, the CPU sees byte0x37— the removed AAA instruction, invalid in 64-bit mode → #UD (invalid opcode)
Verifying memory content
Read the actual bytes from process memory via the kernel fault handler:
code at ip: 37 ee 06 00 f7 d8 48 98 49 89 04 24 48 8b 05 8c
Matches the file exactly. Demand paging loaded the correct bytes. The CPU really IS executing from the middle of a valid instruction.
Register state at crash
RIP=0x0000000a000411f1 RSP=0x00000009ffffd3f8 RBP=0x0000000000000001
RAX=0x0000000000000000 RBX=0x0000000a001a9030 RCX=0x0000000a000411f1
RDX=0x0000000000000000 RSI=0x0000000000000000 RDI=0x0000000000000011
R12=0x00000009ffffd80f R13=0x0000000a000cd0b0 R14=0x0000000000000000
Key observation: RCX == RIP. On x86-64, syscall sets RCX = return
address. This suggests the crash address was the return point from a prior
syscall, and the register was never overwritten.
Stack analysis
[+0] = 0x0000000a001255a4 (data — not a return address)
[+8] = 0x0000000000000000
[+16] = 0x0000000a0006a3be (return from __overflow → after syscall at 0x5f3bc)
[+24] = 0x00000009ffffd7c0 (saved RBP)
The __overflow function at 0x5f3bc has a syscall instruction —
this is musl's write() syscall wrapper called during stdio flushing.
What the mov instruction accesses
The faulting mov 0x6ee37(%rip),%eax reads from virtual address 0xa502c
(RIP-relative), which is in musl's BSS (zero-initialized data, not in
the file). If this page isn't mapped yet, a demand page fault occurs.
Leading hypothesis: signal return corrupts RIP
The crash site is in timezone code called during localtime(). OpenRC
forks child processes to scan /etc/init.d/, and these children exit,
generating SIGCHLD signals. If SIGCHLD arrives while the parent is
executing the mov instruction at 0x361ef:
- CPU is at RIP=
0x361ef, executingmov 0x6ee37(%rip),%eax - SIGCHLD is pending — signal delivery saves RIP to the signal frame
- Signal handler runs, calls
rt_sigreturn - Bug:
sigreturnrestores RIP as0x361f1instead of0x361ef(2-byte offset error) - CPU resumes at
0x361f1→ byte0x37→ INVALID_OPCODE
The 2-byte offset matches the size of syscall (0f 05) — the signal
delivery code might be confusing the faulting instruction address with a
post-syscall return address.
Diagnostic tooling built
crash_handler.c: LD_PRELOAD library with__attribute__((constructor))that installs SIGILL/SIGSEGV/SIGBUS handlers printing registers and code bytes. Didn't fire because OpenRC forks+exec's helpers which reset handlers.- Kernel register dump: Added register and code-byte dump to the
handle_user_faultpath. - PID-tagged interpreter logging:
interp: pid=7 base=0xa0000b000
Status
| Suite | Result |
|---|---|
| Contract tests | 159/159 PASS |
| M10 APK (ext2) | 7/7 PASS |
| ext4 comprehensive | 29/29 PASS |
| OpenSSL/TLS | 18/18 PASS |
Root Cause Found via GDB (update)
Autonomous GDB tooling
Built tools/gdb-investigate.py — a general-purpose autonomous GDB crash
debugger for Kevlar:
- Patches kernel ELF with init path
- Starts QEMU with KVM + GDB stub
- Connects GDB, sets hardware breakpoints, runs Python scripts
- Outputs structured JSON for analysis
make gdb-investigate BREAK=0x... STEP=20Makefile target
GDB trace sequence
-
Break at
sysretq: Found that RCX =0xa000411f1(the crash address) right beforesysretqexecutes — confirming the kernel returns to the wrong user-mode address. -
Break at
syscall_entry: The SAME process enteredwait4with RCX =0xa0012f347(the CORRECT return address). Soframe.ripchanged DURING thewait4sleep. -
PtRegs dump at crash:
frame.rcx = 0xa0012f347(correct, set by hardwaresyscall) butframe.rip = 0xa000411f1(corrupted). These are pushed from the SAME register at syscall entry — they should be identical. -
Stack search:
0xa000411f1appears 3 more times in the kernel stack below the PtRegs frame. This value is a legitimate ld-musl timezone code address that gets written as a local variable during the wait4/scheduler code path, and accidentally overwritesframe.rip.
Definitive root cause
Kernel stack corruption during wait4 sleep. The syscall frame's
rip field (offset +128 in PtRegs) is overwritten by a legitimate code
address (0xa000411f1 = musl timezone code) that lives on the same kernel
stack as a local variable. The scheduler or wait queue code's deep call
chain + timer interrupt frames overlap with the PtRegs area.
Next step
Find the exact write that corrupts frame.rip — either:
- Set a hardware write watchpoint on the
frame.ripstack address - Increase kernel stack size from 2-page to 4-page usable region
- Audit the
sleep_signalable_until→ scheduler → context switch call depth for stack overflow potential