Blog 092: ktrace goes multi-arch — ARM64 semihosting transport and standalone repo
Date: 2026-03-19 Milestone: M10 Alpine Linux
Context
ktrace is Kevlar's high-bandwidth binary kernel tracer. Until today it was
x86_64-only: each trace event calls outb(0xe9, byte) to QEMU's ISA debugcon
device, which writes to a host chardev file at ~5 MB/s on KVM.
ARM64 just got real BusyBox support (Blog 091). The first debugging question we'll hit when ARM64 tests fail is "what was the kernel doing at the time?". ISA debugcon is a PC/AT bus device — it doesn't exist on ARM's virt machine.
We needed an ARM64 equivalent. We also noticed that the ktrace protocol (wire format + QEMU integration) is useful to any bare-metal kernel, not just Kevlar. Both observations pushed in the same direction: design a proper multi-arch transport, then extract ktrace into a standalone repo.
The ARM64 transport: ARM semihosting
ARM semihosting is the ARM-defined mechanism for a guest to communicate with its debug host. QEMU has supported it for years. The protocol is elegant:
x0 = operation number
x1 = parameter block address
HLT #0xF000 ← debug exception; QEMU intercepts and handles it
The operation that matters for tracing is SYS_WRITE (0x05): write a buffer
to an open file handle. Combined with QEMU's -semihosting-config chardev=ID
option, the output goes directly to a host file — exactly what ISA debugcon
does on x86_64.
QEMU x86_64: outb(0xe9, byte) → isa-debugcon → chardev → ktrace.bin
QEMU ARM64: HLT #0xF000 + SYS_WRITE → semihosting → chardev → ktrace.bin
Same chardev, same ktrace.bin, same decoder.
The write_bytes design
For single bytes, SYS_WRITEC (op 3) is the fastest path — one trap,
one byte, x1 points to the byte on the stack:
#![allow(unused)] fn main() { pub fn write_byte(byte: u8) { unsafe { core::arch::asm!( "hlt #0xf000", in("x0") SYS_WRITEC, in("x1") &byte as *const u8, lateout("x0") _, options(nostack), ); } } }
For bulk dumps (ring buffer flush), SYS_WRITE (op 5) is critical: a
single trap writes the entire buffer regardless of size. The parameter
block is a three-word struct on the stack:
#![allow(unused)] fn main() { pub fn write_bytes(data: &[u8]) { let params: [usize; 3] = [STDERR_HANDLE, data.as_ptr() as usize, data.len()]; unsafe { core::arch::asm!( "hlt #0xf000", in("x0") SYS_WRITE, in("x1") params.as_ptr(), lateout("x0") _, options(nostack, readonly), ); } } }
A typical ktrace dump is one CPU × 8192 entries × 32 bytes = 256 KB. On
TCG (no KVM), one semihosting trap is ~500 ns. With SYS_WRITE, the entire
dump completes in a single trap — the same asymptotic cost as ISA
debugcon's single chardev flush.
QEMU flags
# ARM64
-chardev file,id=ktrace,path=ktrace.bin \
-semihosting-config enable=on,target=native,chardev=ktrace
# x86_64 (unchanged)
-chardev file,id=ktrace,path=ktrace.bin \
-device isa-debugcon,chardev=ktrace,iobase=0xe9
Why semihosting is the right answer
The alternative would be to write a custom QEMU MMIO device (a "KTD — Kevlar Trace Device") at a fixed ARM64 virt machine address, similar to how the ISA debugcon device works on x86. That approach would require patching QEMU.
Semihosting gives us 95% of the same design — a QEMU-native mechanism that routes trace output to a chardev — without any QEMU patches. It already exists for exactly this purpose: low-level debug output from a bare-metal guest to the host.
The one remaining limitation is that semihosting output goes to stderr when
no chardev= is configured, which means it mixes with QEMU's own output.
The chardev=ktrace flag cleanly separates trace output into ktrace.bin.
tools/ktrace/ — standalone repo skeleton
ktrace now lives at tools/ktrace/ with its own git init. The intent is
to push it to a public GitHub repo and add it as a submodule. The repo
contains everything a non-Kevlar kernel needs to use the protocol:
tools/ktrace/
├── README.md
├── Cargo.toml (workspace)
├── spec/
│ └── wire-format.md (KTRX v1 binary protocol specification)
├── ktrace-core/ (no_std Rust crate)
│ └── src/
│ ├── lib.rs (DumpHeader, TraceRecord, EventType)
│ ├── format.rs (wire format types with size assertions)
│ └── transport/
│ ├── mod.rs (write_byte / write_bytes dispatch)
│ ├── x86_64.rs (ISA debugcon, outb 0xe9)
│ └── arm64.rs (ARM semihosting, HLT #0xF000)
└── decode/
└── ktrace-decode.py → ../../ktrace-decode.py (symlink)
The ktrace-core crate
ktrace-core is #![no_std] with zero dependencies. A kernel adds it as
a path dependency and enables the appropriate transport feature:
[dependencies]
ktrace-core = { path = "tools/ktrace/ktrace-core", features = ["transport-arm64"] }
Then emits trace data with:
#![allow(unused)] fn main() { use ktrace_core::transport::write_bytes; // dump the ring buffer write_bytes(ring_buffer_slice); }
The wire format types (DumpHeader, TraceRecord, EventType) are
shared between the kernel and the host decoder, eliminating the risk of
format drift.
Integration changes in Kevlar
platform/arm64/debugcon.rs (new)
Architecture-specific semihosting transport, parallel to platform/x64/debugcon.rs.
platform/lib.rs
The pub mod debugcon block was x86_64-only. It now dispatches to the
right transport based on target_arch, and the feature gate is simply
cfg(feature = "ktrace") (not cfg(all(feature = "ktrace", target_arch = "x86_64"))):
#![allow(unused)] fn main() { #[cfg(feature = "ktrace")] pub mod debugcon { pub fn write_bytes(data: &[u8]) { #[cfg(target_arch = "x86_64")] crate::x64::debugcon::write_bytes(data); #[cfg(target_arch = "aarch64")] crate::arm64::debugcon::write_bytes(data); } } }
tools/run-qemu.py
--ktrace now branches on args.arch:
x64: original ISA debugcon flagsarm64:-semihosting-config enable=on,target=native,chardev=ktrace
Makefile
Added ACCEL variable: --kvm on x64, empty on arm64 (TCG-only on x86
hosts). run-ktrace uses $(ACCEL) so make ARCH=arm64 run-ktrace works
without manually stripping --kvm.
Verification
make ARCH=arm64 check FEATURES=ktrace-all # 0 errors
make check FEATURES=ktrace-all # 0 errors (x86_64 regression check)
ARM64 ktrace end-to-end:
make ARCH=arm64 RELEASE=1 run-ktrace
python3 tools/ktrace-decode.py ktrace.bin --summary
What's next
- Push
tools/ktraceto GitHub and add as a git submodule - Migrate Kevlar's format types to
ktrace-coresoTraceRecordis defined once and shared between kernel and decoder - Verify ARM64 ktrace end-to-end — boot with
FEATURES=ktrace-all, run a workload, decode the dump - RISC-V transport — a future architecture; the repo structure already accommodates it
Files changed
platform/arm64/debugcon.rs— new ARM64 semihosting transportplatform/arm64/mod.rs— addpub mod debugcon(cfg-gated onktrace)platform/lib.rs— extendpub mod debugconto dispatch ARM64tools/run-qemu.py—--ktracebranch for ARM64 semihostingMakefile—ACCELvariable;run-ktraceuses$(ACCEL)tools/ktrace/— standalone repo skeleton (new)