Phase 7: Benchmarks, CI Matrix, and Smarter Tooling
With the safety profile infrastructure in place (Phases 0-6), we need to actually measure their impact. This post covers the benchmark suite, cross-profile CI, and some quality-of-life tooling improvements.
Micro-benchmark suite
benchmarks/bench.c is a static musl binary included in the initramfs.
It measures eight fundamental kernel operations:
| Benchmark | What it measures |
|---|---|
getpid | Bare syscall round-trip |
read_null | read(/dev/null, 1) latency |
write_null | write(/dev/null, 1) latency |
pipe | pipe read/write throughput (4 KB chunks) |
fork_exit | fork() + waitpid() latency |
open_close | open() + close() a tmpfs file |
mmap_fault | Anonymous mmap + page fault throughput |
stat | stat() latency |
Output is machine-parseable: BENCH <name> <iters> <total_ns> <per_iter_ns>.
A --quick flag reduces iteration counts for QEMU TCG, where emulation
adds ~10,000x overhead.
Python runner and comparison
benchmarks/run-benchmarks.py wraps the whole flow:
# Run on Kevlar (builds, boots QEMU, parses output)
python3 benchmarks/run-benchmarks.py run --profile balanced
# Run on native Linux for baseline
python3 benchmarks/run-benchmarks.py linux --binary ./bench
# Compare JSON result files side-by-side
python3 benchmarks/run-benchmarks.py compare kevlar.json linux.json
# Run all four safety profiles
python3 benchmarks/run-benchmarks.py all-profiles
Or via Make:
make bench PROFILE=balanced
make bench-all
make bench-compare BENCH_FILES="a.json b.json"
CI matrix: all four profiles
The CI workflow now tests all four safety profiles in parallel:
strategy:
fail-fast: false
matrix:
profile: [fortress, balanced, performance, ludicrous]
Each profile gets its own cargo check step using the correct target spec
(x64-unwind.json for fortress/balanced, x64.json for performance/ludicrous).
A separate clippy job runs on the balanced profile, and rustfmt runs
independently.
QEMU port conflict handling
Previous QEMU sessions sometimes lingered, holding ports 20022 and 20080.
run-qemu.py now detects port conflicts at startup using socket.bind(),
identifies the holder via ss -tlnp, and kills stale QEMU processes
automatically. This eliminates the "address already in use" failures that
plagued iterative development.
Build system fixes
INIT_SCRIPToverride: The Makefile now conditionally setsINIT_SCRIPT=/bin/shonly when not already set, somake benchcan override it to/bin/bench.build.rsenv tracking:kernel/build.rsdeclarescargo::rerun-if-env-changed=INIT_SCRIPTso Cargo recompiles when the init script changes — no more stale binaries after switching between shell and bench modes.- Docker context: The build context is now the repo root (not
testing/), allowing the Dockerfile toCOPY benchmarks/bench.cdirectly.
Early results (QEMU TCG, quick mode)
These numbers are from software emulation and only useful for relative comparison between profiles, not absolute performance:
| Benchmark | Kevlar (ns/op) | Linux (ns/op) |
|---|---|---|
| getpid | 2,233,600 | 264 |
| read_null | 4,289,000 | 306 |
| write_null | 4,164,600 | 288 |
| pipe | 36,718,750 | 1,342 |
The ~10,000x factor is pure TCG overhead. Real performance comparison
requires KVM (make run KVM=1) or native boot, which is where this
infrastructure will shine as Kevlar matures.
What's next
- Fix the GPF-in-userspace bug that crashes fork and later benchmarks
- KVM-accelerated benchmark runs for meaningful Kevlar vs Linux numbers
- Profile-to-profile comparison to quantify the cost of safety features