More comprehensive benchmark results will be available soon, but in the meantime here’s a few examples.
These results are generated from sister project Flowbench which is a network benchmarking tool capable of using libevpl.
RDMA RC 1-thread ping-pong RPC with queue depth 16, 64-byte requests:
A single thread using RDMA RC can achieve over 2 million RPCs per second at average latency of 6.3uS
Flow: Sent: 1.30 GB (1.04 Gbps) [2.04 Mops/s], Recv: 1.30 GB (1.04 Gbps) [2.04 Mops/s] | Latency: Min: 2624ns, Max: 1303897ns, Avg: 6397ns
RDMA RC 64-threads ping-pong RPC with queue depth 16, 64-byte requests:
Scaling up to 64 thread achieves 69 million RPCs per second, avg latency increases to only 6.7uS
Flow: Sent: 45.23 GB (35.81 Gbps) [69.95 Mops/s], Recv: 46.16 GB (36.55 Gbps) [71.29 Mops/s] | Latency: Min: 2313ns, Max: 5889650ns, Avg: 6741ns
RDMA RC 1-thread streaming requests with queue depth 16, 128kb requests:
A single thread streaming 128kb requests at queue depth 16 nearly saturates 400GbE ethernet
Flow: Sent: 489.74 GB (391.67 Gbps) [373.53 Kops/s], Recv: 0.00 B (0.00 bps) [0.00 ops/s]
XLIO TCP 1-thread ping-pong RPC with queue depth 8, 64-byte requests:
A single thread driving a single TCP socket is capable of performing nearly a million 64-byte RPCs with average latency of only 11uS.
Flow: Sent: 588.43 MB (470.65 Mbps) [919.25 Kops/s], Recv: 588.43 MB (470.65 Mbps) [919.25 Kops/s] | Latency: Min: 4587ns, Max: 2990063491ns, Avg: 11165ns
XLIO TCP 16-threads ping-pong RPC with queue depth 8, 64-byte requests:
Sixteen threads increases total RPCs to 6.7 million.
Flow: Sent: 4.36 GB (3.48 Gbps) [6.79 Mops/s], Recv: 4.36 GB (3.48 Gbps) [6.79 Mops/s] | Latency: Min: 6710ns, Max: 2977527053ns, Avg: 19016ns
XLIO TCP 1-thread streaming requests with queue depth 32, 128kb requests:
Single TCP socket, single thread achieves 230 Gbps streaming.
Flow: Sent: 288.52 GB (230.77 Gbps) [220.08 Kops/s], Recv: 0.00 B (0.00 bps) [0.00 ops/s]
XLIO TCP 4-threads streaming requests with queue depth 32, 128kb requests:
Increase to just four sockets, four cores and we can saturate 400GbE.
Flow: Sent: 495.67 GB (396.41 Gbps) [378.04 Kops/s], Recv: 0.00 B (0.00 bps) [0.00 ops/s]