chimeraproject
performance · §

Performance

Where the runtime currently sits, how the numbers are measured, and the workloads used to measure them.

Note: the numbers on this page are placeholders from development benchmarks on the configuration described below. Treat them as directional. The bench/ tree in the repository contains the harnesses, so you can reproduce them on your own hardware.
// headline numbers
ECHO ROUND-TRIP — RDMA
2.8µs
p50, 64-byte payload, RoCE v2, single core in poll mode.
ECHO ROUND-TRIP — KERNEL TCP
14µs
p50, 64-byte payload, kernel sockets baseline, single core.
SUSTAINED THROUGHPUT
390Gbps
Single-thread RDMA SEND, 1 MiB messages, zero-copy from pool.
SMALL-MSG RATE
14.2M/s
Send + completion notifications, single core poll mode, 256B messages.
BLOCK I/O — IO_URING
2.6M IOPS
4 KiB random read, one loop, NVMe via io_uring, queue depth 64.
EVENT→POLL TRANSITION
< 5µs
Time from first work arriving to loop reaching poll-mode steady state.
// kernel sockets vs accelerators
echo p50 latency · 64 b payload · single core lower is better
kernel tcp
14.0 µs
io_uring tcp
9.6 µs
xlio tcp
4.9 µs
rdma · event
3.6 µs
rdma · poll
2.8 µs
// placeholder values from the development bench rig; expect these to change as the implementation matures.
// methodology
configuration echo-stream / echo-message harnesses, bench/ tree
cpu
2× AMD EPYC 9354P · 32C / 64T per socket, SMT off in test runs
memory
512 GiB DDR5-4800, NUMA-pinned to the NIC’s socket
nic — rdma / xlio
NVIDIA ConnectX-7 400 GbE · RoCE v2 · OFED 24.x
nic — kernel sockets
Same NIC, kernel TCP path
kernel
Linux 6.x · CONFIG_PREEMPT_NONE · isolcpus on benchmark cores
nvme
Samsung PM1743 Gen5 NVMe · io_uring backend; VFIO numbers run separately
governance
CPU governor: performance · turbo locked · C-states limited to C1 · IRQ-affinity off benchmark cores
measurement
Bench harness reports raw histograms; p50/p99/p99.9 reported at minimum. Warmup ≥ 30 s, run ≥ 60 s.
harnesses
echo-stream · echo-message · custom block bench in bench/block
// what we test for
// reproducibility

Every number on this page comes from a harness in the public repository, so anyone with comparable hardware can reproduce it and anyone without can still check the methodology. For help with configuration, open an issue or ask on Discord.

Repository & bench/ tree → Read the architecture Build & install