Kernel Bypass Networking with AF_XDP
Bypassing the Linux networking stack to eliminate the 10-50µs overhead of packet processing.
If you write a standard socket server in C or Rust (std::net::UdpSocket), you are standing on the shoulders of giants. But those giants are slow.
The Linux Networking Stack is a marvel of features: TCP congestion control, IP routing tables, netfilter firewalls (iptables), NAT, etc. But for HFT market data, we don’t need any of that. We just want the raw ethernet frame.
The Cost of Abstractions
Path of a packet in Linux:
- NIC: DMA packet to Ring Buffer.
- IRQ: Interrupt CPU to say “Data is here”.
- NAPI: Softirq polling begins.
- sk_buff: Kernel allocates a complex struct
sk_buff(200+ bytes metadata). - Protocol Layers: Ethernet -> IP -> UDP processing.
- Context Switch: Scheduler wakes your app.
- recvfrom(): Copy payload from Kernel space to User space.
Total Latency: ~15-25µs. Jitter: High (System calls, interrupts).
Kernel Bypass: Options
1. DPDK (Data Plane Development Kit)
The “Nuclear Option.”
- How: Unbinds the NIC driver from Linux. Loads a custom user-space driver.
- Pros: Fastest possible (sub-microsecond).
- Cons: “Highjacks” the NIC. The OS loses the interface. No SSH, no metrics, no nothing on that port. Hard to debug.
2. AF_XDP (eXpress Data Path)
The “Modern Option” (Linux 4.18+).
- How: Creates a special socket
AF_XDPthat redirects packets to user-space memory (UMEM) before thesk_buffallocation. - Pros: Very fast (~2-3µs). Co-exists with Linux stack (certain packets can be passed up to kernel). Uses standard drivers (newer builds).
- Cons: Requires newer kernels (5.10+ recommended).
AF_XDP Architecture
AF_XDP is powered by eBPF (extended Berkeley Packet Filter). A tiny program runs directly on the NIC driver hook.
// BPF Program (XDP Hook)
SEC("xdp")
int xdp_prog(struct xdp_md *ctx) {
// Look at packet header
void *data = (void *)(long)ctx->data;
struct ethhdr *eth = data;
// Is it port 9000? Redirect to our AF_XDP socket.
if (eth->h_proto == htons(ETH_P_IP) && is_port_9000(data)) {
return bpf_redirect_map(&xsks_map, 0, 0);
}
// Otherwise, let the Kernel handle it (SSH, Ping, etc.)
return XDP_PASS;
}
This flexibility is game-changing. We can dedicate one UDP port for “Fast Path” trading, while keeping SSH working on the same cable.
The 4 Rings
AF_XDP uses four ring buffers for Zero-Copy communication.
- FILL Ring (User -> Kernel): “Here are some empty block of memory (addresses). Use them to store incoming packets.”
- RX Ring (Kernel -> User): “I received a packet! It is stored at address X in the UMEM.”
- TX Ring (User -> Kernel): “Send the packet stored at address Y.”
- COMPLETION Ring (Kernel -> User): “I finished sending packet Y. You can reuse that memory.”
Rust Implementation
We use the aya (eBPF loader) and standard libraries to interact with libxdp.
// Simplified Setup
let config = SocketConfig::default();
let (mut tx, mut rx) = xsk::Socket::new(ifname, queue_id, umem, &config)?;
loop {
// 1. Check RX Ring
let batch = rx.poll(32);
for desc in batch {
// Direct access to raw packet bytes in UMEM
let raw_bytes = unsafe { umem.read(desc.addr, desc.len) };
// 2. Process Market Data (Zero Copy!)
let market_update = parser::parse(raw_bytes);
// 3. Trigger Strategy
strategy.on_tick(market_update);
}
// 4. Return buffers to Fill Ring
fill_ring.produce(batch.len());
}
Performance Reality
On a tuned system (Mellanox ConnectX-6, isolcpus enabled):
- PPS (Packets Per Second): 14 Million+ (Single Core).
- Latency: < 3µs wire-to-strategy.
We have removed the OS from the equation. Now, how do we secure the keys in this bare-metal environment?
Questions about this lesson? Working on related infrastructure?
Let's discuss