chimeraproject
Performance · §

pNFS fio performance results

// test environment
topology 2 storage servers · 1 client · pNFS flex-files over RDMA
storage servers
2× Chimera diskfs · 16 NVMe each via VFIO · 2×200 GbE each
server roles
node-3 = MDS + DS (metadata + data) · node-1 = DS only
pnfs layout
flex-files, 2 data servers (10.67.25.209 / .211), RDMA netid
client
· 2×200 GbE → 400 Gbps / 50 GB/s aggregate link ceiling
client transport
both clients over RDMA (RoCE); the Linux NFS mount uses nconnect for parallel connections
server runtime
80 threads · RDMA (RoCE, port 20049, TOS 104) · 1 GiB huge pages · 8 preallocated slabs
backend
diskfs · metadata-only block cache (1,048,576 blocks) · inode cache 1,048,576 — data blocks are not cached, so the random phases hit the NVMe device
workload
fio · 32 jobs × QD 32 · 100 files × 8 MiB · direct=1
phases
512 KiB sequential layout (warm-up), then 4 KiB randwrite & randread · 10 s ramp + 30 s timed
// the two clients, side by side
workload
Linux kernel NFS
Chimera in-process
ratio
512 KiB seq write (layout)
24.7 GB/s
42.3 GB/s
1.7×
4 KiB random write — IOPS
733k
10.8M
14.8×
4 KiB random write — bandwidth
3.00 GB/s
44.4 GB/s
14.8×
4 KiB random write — avg latency
1395 µs
94 µs
14.8×
4 KiB random read — IOPS
1.02M
10.5M
10.3×
4 KiB random read — bandwidth
4.17 GB/s
43.1 GB/s
10.3×
4 KiB random read — avg latency
1002 µs
97 µs
10.3×
context switches / 30 s run (read)
21.6M
840
// notes on the comparison

Both runs use the same servers, layout, fabric, and fio workload, and both execute on the same physical client machine — same CPU, NICs, and hardware. The only thing that changes is the client software stack, so this is not an apples-to-apples comparison of the same software:

// where the latency goes — client vs block device

The servers export a per-operation block-I/O latency histogram (libevpl, via Prometheus). Aggregated across all 32 NVMe devices on the two servers, that gives the raw device latency for the same 4 KiB random workload. Device latency does not depend on which client drove the I/O, so it sits as a fixed backdrop under both client runs; the fio client percentiles below are what each client added on top.

4 KiB random read
p50
mean
p99
block device — NVMe
~78 µs
~73 µs
~192 µs
chimera in-process client
95 µs
97 µs
206 µs
Linux kernel NFS client
848 µs
1002 µs
3752 µs
4 KiB random write
p50
mean
p99
block device — NVMe
~12 µs
~18 µs
~100 µs
chimera in-process client
89 µs
94 µs
229 µs
Linux kernel NFS client
611 µs
1395 µs
32375 µs

For reads the chimera client lands about 20 µs above the device across the whole distribution (p50 +17 µs, p99 +14 µs) — a read is close to a device passthrough plus the RDMA/pNFS round trip. For writes the chimera gap is larger, about 77 µs at p50, because a client write is not a single block op: the server turns it into a logged transaction (intent-log write, data write, tail push), so this figure is the durable write path plus transport, not transport alone.

The Linux client's millisecond numbers are mostly queueing, not work. Both clients run over RDMA (RoCE), and the Linux mount uses nconnect for parallel connections, so the transport is not the difference. Both runs also use the same fio config — 32 jobs at queue depth 32, up to ~1024 outstanding 4 KiB requests — to keep the comparison equal, but that grossly oversubscribes the in-kernel NFS client, which cannot keep anywhere near that many in flight. Most of each request's time is spent waiting its turn to be processed, not on the device or the wire: the device latencies above (tens of µs) are what the storage actually delivered under that same load.

// configs — client

The two fio jobs are identical except for the engine and the target path — io_uring against the kernel NFS mount at /mnt, versus the Chimera engine against the export at /export.

fio — linux kernel mount chimera-linux.fio
[global]
ioengine=io_uring
thread=1
filename_format=/mnt/f.$jobnum.$filenum
randrepeat=0
group_reporting
direct=1
filesize=8m
nrfiles=100
numjobs=32
iodepth=32

[layout]
rw=write
bs=512k
stonewall

[rand_write]
rw=randwrite
bs=4k
time_based
ramp_time=10
runtime=30
startdelay=0
stonewall

[rand_read]
rw=randread
bs=4k
time_based
ramp_time=10
runtime=30
startdelay=0
stonewall
fio — chimera engine chimera.fio
[global]
ioengine=external:/root/chimera/build/Release/src/fio/libchimera_fio.so
chimera_config=/root/chimera.json
chimera_log=/tmp/chimera.log
chimera_debug=0
thread=1
filename_format=/export/f.$jobnum.$filenum
randrepeat=0
group_reporting
direct=1
filesize=8m
nrfiles=100
numjobs=32
iodepth=32

[layout]
rw=write
bs=512k
stonewall

[rand_write]
rw=randwrite
bs=4k
time_based
ramp_time=10
runtime=30
startdelay=0
stonewall

[rand_read]
rw=randread
bs=4k
time_based
ramp_time=10
runtime=30
startdelay=0
stonewall
// configs — servers

Two diskfs servers, each owning 16 NVMe devices through VFIO. node-3 runs the MDS and advertises both nodes as flex-files data servers; node-1 is a pure data server.

server — MDS + DS (node-3) pnfs.json
{
    "common": {
        "sync_delegation": false,
        "async_delegation": false,
        "rdmacm_tos": 104,
        "huge_pages": true,
        "huge_page_size": "1G",
        "slab_size": "1G",
        "preallocate_slabs": 8,
        "preallocate_threads": 8
    },
    "server": {
        "threads": 80,
        "metrics_port": 9000,
        "rest_http_port": 8080,
        "rdma": true,
        "tcp_flavor": "plain",
        "rdma_hostname": "0.0.0.0",
        "rdma_port": 20049,
        "nfs4_session_slots": 4096,
        "pnfs": {
            "enabled": true,
            "data_servers": [
                { "netid": "rdma", "tcp": "10.67.25.209", "rdma": "10.67.25.209", "backing_path": "/ds1" },
                { "netid": "rdma", "tcp": "10.67.25.211", "rdma": "10.67.25.211", "backing_path": "/ds2" }
            ]
        },
        "vfs": {
            "diskfs": {
                "path": "./build/Release/src/vfs/diskfs/libchimera_vfs_diskfs.so",
                "config": {
                    "initialize": true,
                    "block_cache_blocks": 1048576,
                    "inode_cache_inodes": 1048576,
                    "devices": [
                        { "type": "vfio", "path": "01:00.0" },
                        { "type": "vfio", "path": "03:00.0" },
                        { "type": "vfio", "path": "05:00.0" },
                        { "type": "vfio", "path": "07:00.0" },
                        { "type": "vfio", "path": "41:00.0" },
                        { "type": "vfio", "path": "43:00.0" },
                        { "type": "vfio", "path": "45:00.0" },
                        { "type": "vfio", "path": "47:00.0" },
                        { "type": "vfio", "path": "81:00.0" },
                        { "type": "vfio", "path": "83:00.0" },
                        { "type": "vfio", "path": "85:00.0" },
                        { "type": "vfio", "path": "87:00.0" },
                        { "type": "vfio", "path": "c1:00.0" },
                        { "type": "vfio", "path": "c3:00.0" },
                        { "type": "vfio", "path": "c5:00.0" },
                        { "type": "vfio", "path": "c7:00.0" }
                    ]
                }
            }
        }
    },

    "mounts": {
        "mds": { "module": "diskfs", "path": "/mds", "create": { "mode": "0777" } },
        "ds1": { "module": "nfs",    "path": "10.67.25.209:/ds" },
        "ds2": { "module": "diskfs", "path": "/ds",  "create": { "mode": "0755" } }
    },

    "exports": {
        "/export": { "path": "/mds" },
        "/ds1":    { "path": "/ds" }
    }
}
server — DS only (node-1) chimera.json
{
    "common": {
        "sync_delegation": false,
        "async_delegation": false,
        "rdmacm_tos": 104,
        "huge_pages": true,
        "huge_page_size": "1G",
        "slab_size": "1G",
        "preallocate_slabs": 8,
        "preallocate_threads": 8
    },
    "server": {
        "threads": 80,
        "metrics_port": 9000,
        "rest_http_port": 8080,
        "rdma": true,
        "tcp_flavor": "plain",
        "rdma_hostname": "0.0.0.0",
        "rdma_port": 20049,
        "vfs": {
            "diskfs": {
                "path": "./build/Release/src/vfs/diskfs/libchimera_vfs_diskfs.so",
                "config": {
                    "initialize": true,
                    "block_cache_blocks": 1048576,
                    "inode_cache_inodes": 1048576,
                    "devices": [
                        { "type": "vfio", "path": "01:00.0" },
                        { "type": "vfio", "path": "03:00.0" },
                        { "type": "vfio", "path": "05:00.0" },
                        { "type": "vfio", "path": "07:00.0" },
                        { "type": "vfio", "path": "41:00.0" },
                        { "type": "vfio", "path": "43:00.0" },
                        { "type": "vfio", "path": "45:00.0" },
                        { "type": "vfio", "path": "47:00.0" },
                        { "type": "vfio", "path": "81:00.0" },
                        { "type": "vfio", "path": "83:00.0" },
                        { "type": "vfio", "path": "85:00.0" },
                        { "type": "vfio", "path": "87:00.0" },
                        { "type": "vfio", "path": "c1:00.0" },
                        { "type": "vfio", "path": "c3:00.0" },
                        { "type": "vfio", "path": "c5:00.0" },
                        { "type": "vfio", "path": "c7:00.0" }
                    ]
                }
            }
        }
    },

    "mounts": {
        "ds": { "module": "diskfs", "path": "/" }
    },

    "exports": {
        "/ds": { "path": "/ds" }
    },
    "shares": {
        "diskfs": { "path": "/diskfs" }
    },
    "buckets": {
        "diskfs": { "path": "/diskfs" }
    }
}
// raw output — fio

The full fio reports for both runs, verbatim.

fio output — linux kernel NFS client 32 jobs × QD 32
layout: (g=0): rw=write, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=io_uring, iodepth=32
...
rand_write: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=32
...
rand_read: (g=2): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=32
...
fio-3.40
Starting 96 threads
layout: Laying out IO files (28 files / total 224MiB)
layout: Laying out IO files (40 files / total 320MiB)
layout: Laying out IO files (30 files / total 240MiB)
layout: Laying out IO files (37 files / total 296MiB)
layout: Laying out IO files (37 files / total 296MiB)
layout: Laying out IO files (27 files / total 216MiB)
layout: Laying out IO files (29 files / total 232MiB)
layout: Laying out IO files (43 files / total 344MiB)
layout: Laying out IO files (39 files / total 312MiB)
layout: Laying out IO files (45 files / total 360MiB)
layout: Laying out IO files (31 files / total 248MiB)
layout: Laying out IO files (27 files / total 216MiB)
layout: Laying out IO files (30 files / total 240MiB)
layout: Laying out IO files (35 files / total 280MiB)
layout: Laying out IO files (39 files / total 312MiB)
layout: Laying out IO files (34 files / total 272MiB)
layout: Laying out IO files (41 files / total 328MiB)
layout: Laying out IO files (33 files / total 264MiB)
layout: Laying out IO files (35 files / total 280MiB)
layout: Laying out IO files (43 files / total 344MiB)
layout: Laying out IO files (31 files / total 248MiB)
layout: Laying out IO files (40 files / total 320MiB)
layout: Laying out IO files (33 files / total 264MiB)
layout: Laying out IO files (31 files / total 248MiB)
layout: Laying out IO files (28 files / total 224MiB)
layout: Laying out IO files (39 files / total 312MiB)
layout: Laying out IO files (40 files / total 320MiB)
layout: Laying out IO files (34 files / total 272MiB)
layout: Laying out IO files (39 files / total 312MiB)
layout: Laying out IO files (40 files / total 320MiB)
layout: Laying out IO files (38 files / total 304MiB)
layout: Laying out IO files (39 files / total 312MiB)
Jobs: 32 (f=3199): [_(64),r(5),f(1),r(6),f(1),r(1),f(1),r(17)][100.0%][r=3958MiB/s][r=1013k IOPS][eta 00m:00s]
layout: (groupid=0, jobs=32): err= 0: pid=1312721: Sat Jun  6 14:00:03 2026
  write: IOPS=47.1k, BW=23.0GiB/s (24.7GB/s)(25.0GiB/1086msec); 0 zone resets
    slat (usec): min=2, max=265, avg=23.13, stdev=11.34
    clat (usec): min=217, max=292715, avg=19797.63, stdev=34053.57
     lat (usec): min=220, max=292734, avg=19820.76, stdev=34053.36
    clat percentiles (usec):
     |  1.00th=[   371],  5.00th=[   594], 10.00th=[   807], 20.00th=[  1893],
     | 30.00th=[  2769], 40.00th=[  3523], 50.00th=[ 13042], 60.00th=[ 17957],
     | 70.00th=[ 19268], 80.00th=[ 20841], 90.00th=[ 31327], 95.00th=[104334],
     | 99.00th=[168821], 99.50th=[189793], 99.90th=[231736], 99.95th=[246416],
     | 99.99th=[274727]
   bw (  MiB/s): min=21866, max=21866, per=92.76%, avg=21866.27, stdev= 0.00, samples=32
   iops        : min=43717, max=43717, avg=43717.00, stdev= 0.00, samples=32
  lat (usec)   : 250=0.01%, 500=3.40%, 750=5.29%, 1000=3.72%
  lat (msec)   : 2=8.71%, 4=23.35%, 10=3.94%, 20=25.87%, 50=17.88%
  lat (msec)   : 100=2.57%, 250=5.23%, 500=0.04%
  cpu          : usr=3.82%, sys=1.21%, ctx=55791, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.5%, 16=1.0%, 32=98.1%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,51200,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
rand_write: (groupid=1, jobs=32): err= 0: pid=1312863: Sat Jun  6 14:00:03 2026
  write: IOPS=733k, BW=2863MiB/s (3002MB/s)(83.9GiB/30006msec); 0 zone resets
    slat (nsec): min=291, max=384271, avg=1790.10, stdev=1092.37
    clat (usec): min=3, max=217186, avg=1395.23, stdev=6022.21
     lat (usec): min=60, max=217187, avg=1397.02, stdev=6022.21
    clat percentiles (usec):
     |  1.00th=[   104],  5.00th=[   145], 10.00th=[   174], 20.00th=[   219],
     | 30.00th=[   269], 40.00th=[   363], 50.00th=[   611], 60.00th=[   676],
     | 70.00th=[   742], 80.00th=[   898], 90.00th=[  1205], 95.00th=[  1909],
     | 99.00th=[ 32375], 99.50th=[ 52167], 99.90th=[ 77071], 99.95th=[ 85459],
     | 99.99th=[105382]
   bw (  MiB/s): min= 2212, max= 3603, per=100.00%, avg=2865.08, stdev= 9.17, samples=1920
   iops        : min=566425, max=922543, avg=733457.27, stdev=2346.51, samples=1920
  lat (usec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.75%
  lat (usec)   : 250=26.19%, 500=17.38%, 750=26.99%, 1000=11.54%
  lat (msec)   : 2=12.36%, 4=1.44%, 10=1.34%, 20=0.63%, 50=0.84%
  lat (msec)   : 100=0.53%, 250=0.01%
  cpu          : usr=2.34%, sys=9.30%, ctx=17360075, majf=0, minf=314
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,21994226,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
rand_read: (groupid=2, jobs=32): err= 0: pid=1313325: Sat Jun  6 14:00:03 2026
  read: IOPS=1019k, BW=3980MiB/s (4173MB/s)(117GiB/30003msec)
    slat (nsec): min=240, max=4621.0k, avg=2222.40, stdev=11759.93
    clat (usec): min=10, max=25878, avg=1002.14, stdev=668.47
     lat (usec): min=67, max=25880, avg=1004.37, stdev=668.49
    clat percentiles (usec):
     |  1.00th=[  225],  5.00th=[  322], 10.00th=[  396], 20.00th=[  510],
     | 30.00th=[  619], 40.00th=[  725], 50.00th=[  848], 60.00th=[  988],
     | 70.00th=[ 1139], 80.00th=[ 1352], 90.00th=[ 1729], 95.00th=[ 2212],
     | 99.00th=[ 3752], 99.50th=[ 4424], 99.90th=[ 5342], 99.95th=[ 5604],
     | 99.99th=[ 6521]
   bw (  MiB/s): min= 3784, max= 4350, per=100.00%, avg=3981.21, stdev= 2.99, samples=1920
   iops        : min=968845, max=1113846, avg=1019187.40, stdev=765.14, samples=1920
  lat (usec)   : 20=0.01%, 50=0.01%, 100=0.01%, 250=1.78%, 500=17.43%
  lat (usec)   : 750=22.75%, 1000=19.07%
  lat (msec)   : 2=32.30%, 4=5.87%, 10=0.80%, 20=0.01%, 50=0.01%
  cpu          : usr=4.35%, sys=13.28%, ctx=21597989, majf=0, minf=210
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=30567165,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=23.0GiB/s (24.7GB/s), 23.0GiB/s-23.0GiB/s (24.7GB/s-24.7GB/s), io=25.0GiB (26.8GB), run=1086-1086msec

Run status group 1 (all jobs):
  WRITE: bw=2863MiB/s (3002MB/s), 2863MiB/s-2863MiB/s (3002MB/s-3002MB/s), io=83.9GiB (90.1GB), run=30006-30006msec

Run status group 2 (all jobs):
   READ: bw=3980MiB/s (4173MB/s), 3980MiB/s-3980MiB/s (4173MB/s-4173MB/s), io=117GiB (125GB), run=30003-30003msec
fio output — chimera in-process client 32 jobs × QD 32
layout: (g=0): rw=write, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=chimera, iodepth=32
...
rand_write: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=chimera, iodepth=32
...
randread: (g=2): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=chimera, iodepth=32
...
fio-3.40
Starting 96 threads
Jobs: 32 (f=3200): [(64),r(32)][100.0%][r=39.9GiB/s][r=10.5M IOPS][eta 00m:00s]
layout: (groupid=0, jobs=32): err= 0: pid=1308608: Sat Jun  6 13:50:55 2026
  write: IOPS=80.8k, BW=39.4GiB/s (42.3GB/s)(25.0GiB/634msec); 0 zone resets
    slat (nsec): min=1152, max=145379, avg=15370.22, stdev=8937.85
    clat (usec): min=88, max=108608, avg=10248.51, stdev=12388.82
     lat (usec): min=92, max=108642, avg=10263.88, stdev=12390.89
    clat percentiles (usec):
     |  1.00th=[   103],  5.00th=[   285], 10.00th=[   498], 20.00th=[   922],
     | 30.00th=[  1598], 40.00th=[  2540], 50.00th=[  4424], 60.00th=[  9634],
     | 70.00th=[ 16057], 80.00th=[ 19268], 90.00th=[ 23200], 95.00th=[ 29230],
     | 99.00th=[ 56886], 99.50th=[ 78119], 99.90th=[ 94897], 99.95th=[ 99091],
     | 99.99th=[105382]
   bw (  MiB/s): min=32894, max=32894, per=81.47%, avg=32894.62, stdev= 0.00, samples=26
   iops        : min=65778, max=65778, avg=65778.00, stdev= 0.00, samples=26
  lat (usec)   : 100=0.73%, 250=3.69%, 500=5.64%, 750=6.00%, 1000=5.16%
  lat (msec)   : 2=13.87%, 4=13.13%, 10=12.12%, 20=21.64%, 50=16.73%
  lat (msec)   : 100=1.24%, 250=0.04%
  cpu          : usr=84.89%, sys=0.46%, ctx=1750, majf=0, minf=6956
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.5%, 16=1.0%, 32=98.1%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,51200,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
rand_write: (groupid=1, jobs=32): err= 0: pid=1309110: Sat Jun  6 13:50:55 2026
  write: IOPS=10.8M, BW=41.4GiB/s (44.4GB/s)(1242GiB/30002msec); 0 zone resets
    slat (nsec): min=200, max=2557.6k, avg=388.83, stdev=239.00
    clat (usec): min=21, max=10978, avg=93.78, stdev=48.33
     lat (usec): min=21, max=10978, avg=94.17, stdev=48.33
    clat percentiles (usec):
     |  1.00th=[   52],  5.00th=[   62], 10.00th=[   67], 20.00th=[   74],
     | 30.00th=[   79], 40.00th=[   84], 50.00th=[   89], 60.00th=[   94],
     | 70.00th=[  101], 80.00th=[  110], 90.00th=[  123], 95.00th=[  137],
     | 99.00th=[  229], 99.50th=[  281], 99.90th=[  416], 99.95th=[  553],
     | 99.99th=[  906]
   bw (  MiB/s): min=40846, max=43462, per=100.00%, avg=42396.65, stdev=17.30, samples=1916
   iops        : min=10456735, max=11126356, avg=10853537.66, stdev=4428.85, samples=1916
  lat (usec)   : 50=0.62%, 100=68.65%, 250=29.95%, 500=0.71%, 750=0.04%
  lat (usec)   : 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
  cpu          : usr=98.83%, sys=0.93%, ctx=267082, majf=0, minf=3406
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,325529457,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32
rand_read: (groupid=2, jobs=32): err= 0: pid=1309251: Sat Jun  6 13:50:55 2026
  read: IOPS=10.5M, BW=40.1GiB/s (43.1GB/s)(1203GiB/30001msec)
    slat (nsec): min=180, max=175275, avg=367.51, stdev=205.67
    clat (usec): min=9, max=20342, avg=96.64, stdev=44.73
     lat (usec): min=9, max=20343, avg=97.01, stdev=44.74
    clat percentiles (usec):
     |  1.00th=[   65],  5.00th=[   70], 10.00th=[   74], 20.00th=[   79],
     | 30.00th=[   87], 40.00th=[   92], 50.00th=[   95], 60.00th=[   98],
     | 70.00th=[  101], 80.00th=[  106], 90.00th=[  116], 95.00th=[  128],
     | 99.00th=[  206], 99.50th=[  260], 99.90th=[  383], 99.95th=[  478],
     | 99.99th=[  799]
   bw (  MiB/s): min=40011, max=41733, per=100.00%, avg=41136.92, stdev=11.23, samples=1888
   iops        : min=10262178, max=10703686, avg=10550706.22, stdev=2880.78, samples=1888
  lat (usec)   : 10=0.01%, 20=0.07%, 50=0.27%, 100=66.39%, 250=32.71%
  lat (usec)   : 500=0.52%, 750=0.03%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=99.61%, sys=0.28%, ctx=840, majf=0, minf=204
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=316039490,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=39.4GiB/s (42.3GB/s), 39.4GiB/s-39.4GiB/s (42.3GB/s-42.3GB/s), io=25.0GiB (26.8GB), run=634-634msec

Run status group 1 (all jobs):
  WRITE: bw=41.4GiB/s (44.4GB/s), 41.4GiB/s-41.4GiB/s (44.4GB/s-44.4GB/s), io=1242GiB (1333GB), run=30002-30002msec

Run status group 2 (all jobs):
   READ: bw=40.1GiB/s (43.1GB/s), 40.1GiB/s-40.1GiB/s (43.1GB/s-43.1GB/s), io=1203GiB (1292GB), run=30001-30001msec
// raw output — client top

Client-side top during each run. Under the Linux mount the CPU is spread across nfsiod / rpciod kworkers; under Chimera it is a single fio process busy-polling, with no kernel I/O threads. Headers and busy rows are verbatim; the long tail of idle kernel threads is elided, as marked.

top — during linux NFS run kernel nfsiod / rpciod path
top - 14:05:16 up 4 days, 22:21,  3 users,  load average: 5.84, 13.62, 15.86
Tasks: 3892 total,  47 running, 3845 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us, 13.8 sy,  0.0 ni, 79.0 id,  7.1 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 386429.6 total, 231825.9 free, 148122.5 used,   9226.2 buff/cache
MiB Swap:   8192.0 total,   8153.7 free,     38.3 used. 238307.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1315100 root      20   0 2617360  44360  33984 S  1344   0.0   6:27.45 fio
1312849 root      20   0       0      0      0 R  32.0   0.0   0:19.89 kworker/u1551:12+rpciod
1315946 root      20   0       0      0      0 R  32.0   0.0   0:01.10 kworker/u1551:0+nfsiod
1316194 root      20   0       0      0      0 I  32.0   0.0   0:00.91 kworker/u1551:14-nfsiod
1316342 root      20   0       0      0      0 I  32.0   0.0   0:00.70 kworker/u1551:21-nfsiod
1311934 root      20   0       0      0      0 I  28.0   0.0   0:08.12 kworker/u1553:9-nfsiod
1313121 root      20   0       0      0      0 R  28.0   0.0   0:19.54 kworker/u1551:28+nfsiod
1313739 root      20   0       0      0      0 I  28.0   0.0   0:06.67 kworker/u1551:48-nfsiod
1315669 root      20   0       0      0      0 I  28.0   0.0   0:01.13 kworker/u1553:1-nfsiod
1316086 root      20   0       0      0      0 I  28.0   0.0   0:00.80 kworker/u1551:2-nfsiod
1316090 root      20   0       0      0      0 I  28.0   0.0   0:01.03 kworker/u1551:3-nfsiod
1316132 root      20   0       0      0      0 R  28.0   0.0   0:01.02 kworker/u1553:6+rpciod
1316137 root      20   0       0      0      0 I  28.0   0.0   0:01.05 kworker/u1551:5-nfsiod
1316185 root      20   0       0      0      0 I  28.0   0.0   0:00.78 kworker/u1551:10-nfsiod
1316323 root      20   0       0      0      0 I  28.0   0.0   0:00.93 kworker/u1551:17-nfsiod
1316327 root      20   0       0      0      0 I  28.0   0.0   0:00.95 kworker/u1551:18-nfsiod
 955127 root      20   0       0      0      0 I  24.0   0.0   0:08.66 kworker/u1553:2-nfsiod
1283422 root      20   0       0      0      0 I  24.0   0.0   0:07.86 kworker/u1553:0-nfsiod
1313431 root      20   0       0      0      0 I  24.0   0.0   0:14.90 kworker/u1549:32-nfsiod
            ... (additional nfsiod / rpciod kworkers at 20-24% CPU elided) ...
%Cpu summary: 0.2% user, 13.8% system, 7.1% iowait — data movement is spread across
dozens of kernel nfsiod / rpciod worker threads rather than in fio itself.
top — during chimera run userspace poll loop
top - 14:05:54 up 4 days, 22:21,  3 users,  load average: 19.64, 16.43, 16.73
Tasks: 4006 total,   1 running, 4005 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.3 us,  0.1 sy,  0.0 ni, 91.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 386429.6 total, 231614.9 free, 148333.2 used,   9225.2 buff/cache
MiB Swap:   8192.0 total,   8153.7 free,     38.3 used. 238096.3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1316608 root      20   0   81.8g 527512  44624 S  3128   0.1   9:34.47 fio
1314410 root      20   0   16608  11672   4108 R  20.0   0.0   0:09.33 top
      1 root      20   0   25604  16452  11068 S   0.0   0.0   0:15.94 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:03.01 kthreadd
            ... (all remaining tasks are idle kernel threads at 0.0% CPU) ...
%Cpu summary: 8.3% user, 0.1% system, 0% iowait — essentially all the work is the
single fio process busy-polling RDMA at 3128% CPU (~31 cores). No kernel I/O threads.