Total Zero-Copy Serialization with rkyv

Why traditional serialization kills latency and how to implement true zero-copy data loading using rkyv in Rust.

Advanced 50 min read

In the world of web development, JSON is king. It is human-readable, flexible, and ubiquitous. But in the world of high-frequency trading (HFT) and systems engineering, serialization formats like JSON, Avro, or even Protobuf are the enemy.

The Serialization Tax

Consider a standard market data update:

{ "s": "BTC-USD", "p": 45000.50, "q": 1.5, "t": 1638291000 }

When your program receives this:

  1. Allocation: It allocates memory for the string.
  2. Parsing: It scans the bytes, typically state-machine based.
  3. Conversion: ASCII “45000.50” must be parsed into a float (expensive!).
  4. Layout: Fields are copied into a struct in memory.

This process burns thousands of CPU cycles. In a system processing 10 million messages per second, this “Parsing Tax” consumes 80-90% of your CPU time.

The Zero-Copy Promise

True Zero-Copy Serialization means the binary format on disk (or wire) is identical to the memory layout of the struct.

  • Deserialization becomes a pointer cast (0 ns).
  • Access is instant.
  • Validation is optional (trusted sources).

Enter rkyv

rkyv (pronounced “archive”) is the gold standard for zero-copy in Rust. Unlike libraries like bincode (which just packs bytes but requires copying to a struct), rkyv guarantees memory representation.

Relative Pointers: The Magic Trick

You might ask: “How can you store a Vec<u8> or String in a zero-copy format? Vectors are pointers to heap memory. If I send you my pointer 0x7ffee..., it points to garbage on your machine.”

rkyv solves this with Relative Pointers. Instead of storing an absolute address (0x8000), it stores an offset (+32 bytes from here).

When you load the archive into memory:

  1. The root object is at offset 0.
  2. The String field says “my data is at +64 bytes”.
  3. You follow the offset.

This makes the data relocatable. It doesn’t matter where in RAM it lands; the internal relationships are preserved.

Hands-On: Zero-Copy Market Data

Let’s build a zero-copy order book event.

1. Dependencies

[dependencies]
rkyv = { version = "0.7", features = ["validation"] }

2. Defining the Struct

We derive Archive, Serialize, and Deserialize. The check_bytes macro generates validation logic (critical for untrusted input).

use rkyv::{Archive, Deserialize, Serialize, Archived};
use bytecheck::CheckBytes;

#[derive(Archive, Deserialize, Serialize, Debug, PartialEq)]
#[archive(check_bytes)] // Enables security validation
#[repr(C)] // Ensure C-compatible layout stability
pub struct MarketEvent {
    pub symbol: [u8; 8], // Fixed size array avoids indirections
    pub timestamp: u64,
    pub price: u64,      // Fixed-point (e.g., satoshis)
    pub quantity: u64,
    pub side: u8,        // 0 = Bid, 1 = Ask
    // Note: avoided String and Vec for the "hot" path
}

3. Serialization (The “Slow” Path)

This happens at the ingress (Feed Handler).

let event = MarketEvent {
    symbol: *b"ETH-USDC",
    timestamp: 1620000000,
    price: 3500_000000,
    quantity: 10_000000,
    side: 1,
};

// Serialize to a fixed-size buffer on the stack (no heap alloc!)
let mut writer = rkyv::ser::serializers::AllocSerializer::<256>::default();
writer.serialize_value(&event).unwrap();
let bytes = writer.into_serializer().into_inner();

4. Deserialization (The “Fast” Path)

This is what the Matching Engine does.

// UNSAFE: Trusted Zero-Copy (Fastest)
// If we trust the source (e.g., our own shared memory ring buffer)
let archived = unsafe { rkyv::archived_root::<MarketEvent>(&bytes) };

println!("Symbol: {:?}", std::str::from_utf8(&archived.symbol));
println!("Price: {}", archived.price);

Cost: Effectively 0 CPU cycles. It is just a pointer calculation.

Advanced: Shared Memory Ring Buffers

In the ZeroCopy Sentinel, we combine rkyv with a shared memory file (/dev/shm).

  1. Writer serializes events directly into the memory-mapped file.
  2. Reader maps the same file.
  3. Reader receives a signal (or polls a cursor).
  4. Reader accesses archived_root at the specific offset.

This eliminates memcpy between processes. The data written by the Feed Handler is instantly visible to the Strategy Engine.

Benchmarks

FormatDeser TimeAllocationCopying
JSON4,200 nsYesYes
Bincode120 nsYesYes
Alpaca60 nsYesYes
Cap’n Proto5 nsNoNo
rkyv< 1 nsNoNo

Summary

  • Avoid Parsing: Parsing is overhead.
  • Relocatable Data: Use relative pointers.
  • Trusted ingress: Validate once at the edge, use unsafe zero-copy internally.

Next, we need a way to pass these zero-copy events between threads without locking. Enter the Disruptor.

Questions about this lesson? Working on related infrastructure?

Let's discuss