Networking

TCP/IP: smoltcp

Kevlar uses smoltcp 0.12 for the TCP/IP stack. smoltcp is a no_std, event-driven network stack that runs entirely inside the kernel without its own thread.

The network stack is accessed through the NetworkStackService trait (Ring 2 boundary):

#![allow(unused)]
fn main() {
pub trait NetworkStackService: Send + Sync {
    fn create_tcp_socket(&self) -> Result<Arc<dyn FileLike>>;
    fn create_udp_socket(&self) -> Result<Arc<dyn FileLike>>;
    fn create_unix_socket(&self) -> Result<Arc<dyn FileLike>>;
    fn create_icmp_socket(&self) -> Result<Arc<dyn FileLike>>;
    fn process_packets(&self);
}
}

Under Fortress/Balanced profiles, calls go through call_service(catch_unwind). Under Performance/Ludicrous, the SmoltcpNetworkStack is called directly as a concrete type (inlined, no vtable dispatch).

Packet Processing

Incoming packets from the VirtIO driver are queued in a lock-free ArrayQueue<Vec<u8>> (128 packets max). The processing loop runs from timer interrupt context:

#![allow(unused)]
fn main() {
loop {
    match iface.poll(timestamp, &mut device, &mut sockets) {
        PollResult::None => break,
        PollResult::SocketStateChanged => {}
    }
}
SOCKET_WAIT_QUEUE.wake_all();
POLL_WAIT_QUEUE.wake_all();
}

Network Configuration

  • DHCP: smoltcp's built-in DHCP client acquires an IP address and gateway at boot.
  • Static: Fixed IP/mask/gateway from kernel parameters.

Socket Types

DomainTypeProtocolImplementation
AF_INETSOCK_STREAMTCPTcpSocket via smoltcp
AF_INETSOCK_DGRAMUDPUdpSocket via smoltcp
AF_INETSOCK_DGRAMICMPIcmpSocket via smoltcp
AF_UNIXSOCK_STREAMUnixSocket (in-kernel)
AF_UNIXSOCK_DGRAMUnixSocket (in-kernel)

Not supported: AF_INET6 (IPv6), AF_NETLINK (returns EAFNOSUPPORT so tools fall back to ioctl-based configuration), AF_PACKET, SOCK_RAW, SOCK_SEQPACKET.

TCP

#![allow(unused)]
fn main() {
pub struct TcpSocket {
    handle: SocketHandle,
    local_endpoint: AtomicCell<Option<IpEndpoint>>,
    backlogs: SpinLock<Vec<Arc<TcpSocket>>>,
    num_backlogs: AtomicUsize,
}
}
  • Listen backlog: up to 8 pre-allocated sockets per listener.
  • Auto port assignment: starting at port 50000.
  • accept() blocks on SOCKET_WAIT_QUEUE until a backlog socket completes the three-way handshake.
  • Buffer sizes: 4 KB RX + 4 KB TX per socket.

UDP

#![allow(unused)]
fn main() {
pub struct UdpSocket {
    handle: SocketHandle,
    peer: SpinLock<Option<IpEndpoint>>,  // Set by connect()
}
}
  • sendto uses the destination from the sockaddr argument or the connected peer.
  • recvfrom returns the source endpoint in metadata.
  • Auto-bind on first send if not explicitly bound.

ICMP

#![allow(unused)]
fn main() {
pub struct IcmpSocket {
    handle: SocketHandle,
    ident: SpinLock<u16>,
}
}

Used by BusyBox ping. Auto-binds with a pseudo-random identifier on first send. Sends and receives raw ICMP echo request/reply packets.

Unix Domain Sockets

Unix domain sockets (AF_UNIX) use a state machine pattern:

UnixSocket (Created)
  ├── bind() → Bound
  │     └── listen() → Listening (UnixListener)
  └── connect() → Connected (UnixStream)

UnixStream

A bidirectional pipe pair. Each direction has a 16 KB ring buffer:

#![allow(unused)]
fn main() {
// Each end owns a tx buffer; peer reads from it
pub struct UnixStream {
    tx: SpinLock<RingBuffer<u8, 16384>>,
    rx: Arc<SpinLock<RingBuffer<u8, 16384>>>,  // = peer's tx
    ancillary: SpinLock<VecDeque<AncillaryData>>,
    // ...
}
}

UnixListener

Accepts incoming connections from a backlog queue (max 128):

#![allow(unused)]
fn main() {
pub struct UnixListener {
    backlog: SpinLock<VecDeque<Arc<UnixStream>>>,
    wait_queue: WaitQueue,
}
}

A global listener registry maps filesystem paths to UnixListener instances. connect() searches this registry to find the listener.

SCM_RIGHTS (File Descriptor Passing)

sendmsg with SCM_RIGHTS ancillary data sends file descriptors across a Unix socket. The sender's Arc<OpenedFile> references are queued on the stream:

#![allow(unused)]
fn main() {
pub enum AncillaryData {
    Rights(Vec<Arc<OpenedFile>>),
}
}

recvmsg installs the received file references into the receiver's file descriptor table and returns the new fd numbers in the control message.

epoll

epoll_create1, epoll_ctl, and epoll_wait are fully implemented:

#![allow(unused)]
fn main() {
pub struct EpollInstance {
    interests: SpinLock<BTreeMap<i32, Interest>>,
}

struct Interest {
    file: Arc<dyn FileLike>,
    events: u32,  // EPOLLIN, EPOLLOUT, EPOLLERR, EPOLLHUP
    data: u64,
}
}

epoll_wait polls all registered interests and returns ready ones. For timeout > 0, it sleeps on POLL_WAIT_QUEUE and re-polls on wakeup. Level-triggered mode only.

The O(n) poll approach is acceptable for typical use (systemd/OpenRC watch ~10 fds).

sendfile

sendfile(out_fd, in_fd, offset, count) reads 4 KB chunks from the input file and writes them to the output socket/file. Uses an intermediate kernel buffer (not zero-copy).

Socket Options

Most socket options are accepted silently for compatibility but not enforced:

LevelOptionsStatus
SOL_SOCKETSO_ERROR, SO_TYPE, SO_RCVBUF, SO_SNDBUFRead (real values)
SOL_SOCKETSO_REUSEADDR, SO_KEEPALIVE, SO_PASSCRED, SO_REUSEPORTWrite (stub)
IPPROTO_TCPTCP_NODELAYWrite (stub)

VirtIO-Net Driver

The VirtioNet driver (exts/virtio_net/) communicates with QEMU's virtio-net device:

  • Supports both modern (12-byte header) and legacy (10-byte header) VirtIO modes.
  • RX queue: pre-allocated 2048-byte descriptors, replenished on IRQ.
  • TX queue: on-demand transmission with dual descriptors (header + payload).
  • Implements EthernetDriver trait consumed by the smoltcp integration layer.

Socket API Summary

SyscallSupport
socketAF_INET (TCP/UDP/ICMP), AF_UNIX
bindIP address + port, Unix path
connectTCP three-way handshake, Unix stream
listen / acceptTCP and Unix listeners
send / recvBasic send/receive
sendto / recvfromUDP datagrams, ICMP
sendmsg / recvmsgSCM_RIGHTS fd passing
setsockopt / getsockoptSee table above
shutdownTCP half-close, Unix stream
getsockname / getpeernameLocal and remote address
socketpairAF_UNIX pairs
poll / epollReadiness monitoring
sendfileFile-to-socket transfer