Unix Domain Sockets: D-Bus Transport Layer
Blog 019 gave Kevlar the event source fds that systemd plugs into epoll. But systemd's main business — managing services — happens over D-Bus, and D-Bus runs on Unix domain sockets. This post covers the AF_UNIX socket implementation that completes the systemd I/O foundation.
The state machine
A Unix socket transitions through states depending on which syscalls are called on it:
socket() → Created
↓ bind() ↓ connect()
Bound Connected (bidirectional stream)
↓ listen()
Listening (accept incoming connections)
The kernel represents this as an enum inside a SpinLock, which means one
Arc<UnixSocket> can transition from Created → Bound → Listening without
changing identity in the fd table:
#![allow(unused)] fn main() { enum SocketState { Created, Bound(String), Listening(Arc<UnixListener>), Connected(Arc<UnixStream>), } }
Each FileLike method checks the current state and delegates to the
appropriate inner type. Read/write on a Listening socket returns EINVAL.
Connect on an already-Connected socket replaces the stream.
Named sockets and the listener registry
When a process calls bind("/run/dbus/system_bus_socket") followed by
listen(), the kernel needs a way for a different process's connect()
to find that listener. We use a simple global registry:
#![allow(unused)] fn main() { static UNIX_LISTENERS: SpinLock<VecDeque<(String, Arc<UnixListener>)>> = SpinLock::new(VecDeque::new()); }
connect() looks up the path, calls enqueue_connection() on the
listener, and gets back the client end of a new stream pair. The listener
pushes the server end into its backlog. accept() pops from the backlog.
This is simpler than creating actual socket inodes in the VFS — we skip
filesystem integration entirely. The path is just a lookup key. For
systemd's use case (well-known paths like /run/dbus/system_bus_socket),
this is sufficient.
Connected streams: shared ring buffers
A connected Unix stream pair is two RingBuffer<u8, 65536> with
crossed references — each end's tx is the other end's rx:
#![allow(unused)] fn main() { pub struct UnixStream { tx: Arc<SpinLock<StreamInner>>, // our write buffer rx: Arc<SpinLock<StreamInner>>, // peer's write buffer peer_closed: Arc<AtomicBool>, } fn new_pair() -> (Arc<UnixStream>, Arc<UnixStream>) { let buf_a = Arc::new(SpinLock::new(StreamInner { ... })); let buf_b = Arc::new(SpinLock::new(StreamInner { ... })); // a.tx = buf_a, a.rx = buf_b // b.tx = buf_b, b.rx = buf_a } }
The read/write implementation follows the same pattern as pipes: fast
path under lock, slow path via POLL_WAIT_QUEUE.sleep_signalable_until.
EOF detection uses both shut_wr (explicit shutdown) and peer_closed
(the peer's Arc was dropped).
SCM_RIGHTS: passing file descriptors between processes
D-Bus uses sendmsg/recvmsg with SCM_RIGHTS ancillary data to pass
file descriptors between processes. The mechanism:
-
sendmsg: parse
struct msghdrand itscmsghdrchain from userspace. For eachSCM_RIGHTScmsg, look up the sender's fds, clone theirArc<OpenedFile>, and attach them to the stream's ancillary queue. -
recvmsg: after reading data, check for pending ancillary data. For each
SCM_RIGHTScmsg, install theArc<OpenedFile>into the receiver's fd table and write the new fd numbers back to userspace.
The ancillary data queue is a VecDeque<AncillaryData> inside each
stream direction's StreamInner. This decouples the ancillary data
from the byte stream — a received cmsg is associated with the next
recvmsg call, not with a specific byte offset.
#![allow(unused)] fn main() { pub enum AncillaryData { Rights(Vec<Arc<OpenedFile>>), } }
accept4 and setsockopt
accept4 extends accept with SOCK_CLOEXEC and SOCK_NONBLOCK
flags applied to the new fd. We refactored sys_accept to delegate to
sys_accept4 with flags=0.
setsockopt is a stub that silently accepts the options systemd and
D-Bus set: SO_REUSEADDR, SO_PASSCRED, SO_KEEPALIVE, TCP_NODELAY,
and buffer size options. None of these affect behavior yet.
What's next
With Unix domain sockets, Kevlar has the complete transport layer for
D-Bus. Phase 4 adds the remaining syscall stubs that systemd needs
before its main loop — socketpair, inotify, and the various prctl
and fcntl options that systemd probes on startup.