direct=1Both runs use the same servers, layout, fabric, and fio workload, and both execute on the same physical client machine — same CPU, NICs, and hardware. The only thing that changes is the client software stack, so this is not an apples-to-apples comparison of the same software:
nfsiod / rpciod worker pool, and the read run shows
21.6M context switches with many kworkers at 20–32% CPU. The limit here is that per-RPC
kernel overhead and shared-state lock contention, not the network or the servers — it
reaches ~4 GB/s, about 8% of the 400 GbE link.The servers export a per-operation block-I/O latency histogram (libevpl, via Prometheus). Aggregated across all 32 NVMe devices on the two servers, that gives the raw device latency for the same 4 KiB random workload. Device latency does not depend on which client drove the I/O, so it sits as a fixed backdrop under both client runs; the fio client percentiles below are what each client added on top.
For reads the chimera client lands about 20 µs above the device across the whole distribution (p50 +17 µs, p99 +14 µs) — a read is close to a device passthrough plus the RDMA/pNFS round trip. For writes the chimera gap is larger, about 77 µs at p50, because a client write is not a single block op: the server turns it into a logged transaction (intent-log write, data write, tail push), so this figure is the durable write path plus transport, not transport alone.
The Linux client's millisecond numbers are mostly queueing, not work. Both clients run over RDMA (RoCE), and the Linux mount uses nconnect for parallel connections, so the transport is not the difference. Both runs also use the same fio config — 32 jobs at queue depth 32, up to ~1024 outstanding 4 KiB requests — to keep the comparison equal, but that grossly oversubscribes the in-kernel NFS client, which cannot keep anywhere near that many in flight. Most of each request's time is spent waiting its turn to be processed, not on the device or the wire: the device latencies above (tens of µs) are what the storage actually delivered under that same load.
The two fio jobs are identical except for the engine and the target path — io_uring
against the kernel NFS mount at /mnt, versus the Chimera engine against the
export at /export.
[global] ioengine=io_uring thread=1 filename_format=/mnt/f.$jobnum.$filenum randrepeat=0 group_reporting direct=1 filesize=8m nrfiles=100 numjobs=32 iodepth=32 [layout] rw=write bs=512k stonewall [rand_write] rw=randwrite bs=4k time_based ramp_time=10 runtime=30 startdelay=0 stonewall [rand_read] rw=randread bs=4k time_based ramp_time=10 runtime=30 startdelay=0 stonewall
[global] ioengine=external:/root/chimera/build/Release/src/fio/libchimera_fio.so chimera_config=/root/chimera.json chimera_log=/tmp/chimera.log chimera_debug=0 thread=1 filename_format=/export/f.$jobnum.$filenum randrepeat=0 group_reporting direct=1 filesize=8m nrfiles=100 numjobs=32 iodepth=32 [layout] rw=write bs=512k stonewall [rand_write] rw=randwrite bs=4k time_based ramp_time=10 runtime=30 startdelay=0 stonewall [rand_read] rw=randread bs=4k time_based ramp_time=10 runtime=30 startdelay=0 stonewall
Two diskfs servers, each owning 16 NVMe devices through VFIO. node-3 runs the MDS and advertises both nodes as flex-files data servers; node-1 is a pure data server.
{
"common": {
"sync_delegation": false,
"async_delegation": false,
"rdmacm_tos": 104,
"huge_pages": true,
"huge_page_size": "1G",
"slab_size": "1G",
"preallocate_slabs": 8,
"preallocate_threads": 8
},
"server": {
"threads": 80,
"metrics_port": 9000,
"rest_http_port": 8080,
"rdma": true,
"tcp_flavor": "plain",
"rdma_hostname": "0.0.0.0",
"rdma_port": 20049,
"nfs4_session_slots": 4096,
"pnfs": {
"enabled": true,
"data_servers": [
{ "netid": "rdma", "tcp": "10.67.25.209", "rdma": "10.67.25.209", "backing_path": "/ds1" },
{ "netid": "rdma", "tcp": "10.67.25.211", "rdma": "10.67.25.211", "backing_path": "/ds2" }
]
},
"vfs": {
"diskfs": {
"path": "./build/Release/src/vfs/diskfs/libchimera_vfs_diskfs.so",
"config": {
"initialize": true,
"block_cache_blocks": 1048576,
"inode_cache_inodes": 1048576,
"devices": [
{ "type": "vfio", "path": "01:00.0" },
{ "type": "vfio", "path": "03:00.0" },
{ "type": "vfio", "path": "05:00.0" },
{ "type": "vfio", "path": "07:00.0" },
{ "type": "vfio", "path": "41:00.0" },
{ "type": "vfio", "path": "43:00.0" },
{ "type": "vfio", "path": "45:00.0" },
{ "type": "vfio", "path": "47:00.0" },
{ "type": "vfio", "path": "81:00.0" },
{ "type": "vfio", "path": "83:00.0" },
{ "type": "vfio", "path": "85:00.0" },
{ "type": "vfio", "path": "87:00.0" },
{ "type": "vfio", "path": "c1:00.0" },
{ "type": "vfio", "path": "c3:00.0" },
{ "type": "vfio", "path": "c5:00.0" },
{ "type": "vfio", "path": "c7:00.0" }
]
}
}
}
},
"mounts": {
"mds": { "module": "diskfs", "path": "/mds", "create": { "mode": "0777" } },
"ds1": { "module": "nfs", "path": "10.67.25.209:/ds" },
"ds2": { "module": "diskfs", "path": "/ds", "create": { "mode": "0755" } }
},
"exports": {
"/export": { "path": "/mds" },
"/ds1": { "path": "/ds" }
}
}{
"common": {
"sync_delegation": false,
"async_delegation": false,
"rdmacm_tos": 104,
"huge_pages": true,
"huge_page_size": "1G",
"slab_size": "1G",
"preallocate_slabs": 8,
"preallocate_threads": 8
},
"server": {
"threads": 80,
"metrics_port": 9000,
"rest_http_port": 8080,
"rdma": true,
"tcp_flavor": "plain",
"rdma_hostname": "0.0.0.0",
"rdma_port": 20049,
"vfs": {
"diskfs": {
"path": "./build/Release/src/vfs/diskfs/libchimera_vfs_diskfs.so",
"config": {
"initialize": true,
"block_cache_blocks": 1048576,
"inode_cache_inodes": 1048576,
"devices": [
{ "type": "vfio", "path": "01:00.0" },
{ "type": "vfio", "path": "03:00.0" },
{ "type": "vfio", "path": "05:00.0" },
{ "type": "vfio", "path": "07:00.0" },
{ "type": "vfio", "path": "41:00.0" },
{ "type": "vfio", "path": "43:00.0" },
{ "type": "vfio", "path": "45:00.0" },
{ "type": "vfio", "path": "47:00.0" },
{ "type": "vfio", "path": "81:00.0" },
{ "type": "vfio", "path": "83:00.0" },
{ "type": "vfio", "path": "85:00.0" },
{ "type": "vfio", "path": "87:00.0" },
{ "type": "vfio", "path": "c1:00.0" },
{ "type": "vfio", "path": "c3:00.0" },
{ "type": "vfio", "path": "c5:00.0" },
{ "type": "vfio", "path": "c7:00.0" }
]
}
}
}
},
"mounts": {
"ds": { "module": "diskfs", "path": "/" }
},
"exports": {
"/ds": { "path": "/ds" }
},
"shares": {
"diskfs": { "path": "/diskfs" }
},
"buckets": {
"diskfs": { "path": "/diskfs" }
}
}The full fio reports for both runs, verbatim.
layout: (g=0): rw=write, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=io_uring, iodepth=32
...
rand_write: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=32
...
rand_read: (g=2): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=32
...
fio-3.40
Starting 96 threads
layout: Laying out IO files (28 files / total 224MiB)
layout: Laying out IO files (40 files / total 320MiB)
layout: Laying out IO files (30 files / total 240MiB)
layout: Laying out IO files (37 files / total 296MiB)
layout: Laying out IO files (37 files / total 296MiB)
layout: Laying out IO files (27 files / total 216MiB)
layout: Laying out IO files (29 files / total 232MiB)
layout: Laying out IO files (43 files / total 344MiB)
layout: Laying out IO files (39 files / total 312MiB)
layout: Laying out IO files (45 files / total 360MiB)
layout: Laying out IO files (31 files / total 248MiB)
layout: Laying out IO files (27 files / total 216MiB)
layout: Laying out IO files (30 files / total 240MiB)
layout: Laying out IO files (35 files / total 280MiB)
layout: Laying out IO files (39 files / total 312MiB)
layout: Laying out IO files (34 files / total 272MiB)
layout: Laying out IO files (41 files / total 328MiB)
layout: Laying out IO files (33 files / total 264MiB)
layout: Laying out IO files (35 files / total 280MiB)
layout: Laying out IO files (43 files / total 344MiB)
layout: Laying out IO files (31 files / total 248MiB)
layout: Laying out IO files (40 files / total 320MiB)
layout: Laying out IO files (33 files / total 264MiB)
layout: Laying out IO files (31 files / total 248MiB)
layout: Laying out IO files (28 files / total 224MiB)
layout: Laying out IO files (39 files / total 312MiB)
layout: Laying out IO files (40 files / total 320MiB)
layout: Laying out IO files (34 files / total 272MiB)
layout: Laying out IO files (39 files / total 312MiB)
layout: Laying out IO files (40 files / total 320MiB)
layout: Laying out IO files (38 files / total 304MiB)
layout: Laying out IO files (39 files / total 312MiB)
Jobs: 32 (f=3199): [_(64),r(5),f(1),r(6),f(1),r(1),f(1),r(17)][100.0%][r=3958MiB/s][r=1013k IOPS][eta 00m:00s]
layout: (groupid=0, jobs=32): err= 0: pid=1312721: Sat Jun 6 14:00:03 2026
write: IOPS=47.1k, BW=23.0GiB/s (24.7GB/s)(25.0GiB/1086msec); 0 zone resets
slat (usec): min=2, max=265, avg=23.13, stdev=11.34
clat (usec): min=217, max=292715, avg=19797.63, stdev=34053.57
lat (usec): min=220, max=292734, avg=19820.76, stdev=34053.36
clat percentiles (usec):
| 1.00th=[ 371], 5.00th=[ 594], 10.00th=[ 807], 20.00th=[ 1893],
| 30.00th=[ 2769], 40.00th=[ 3523], 50.00th=[ 13042], 60.00th=[ 17957],
| 70.00th=[ 19268], 80.00th=[ 20841], 90.00th=[ 31327], 95.00th=[104334],
| 99.00th=[168821], 99.50th=[189793], 99.90th=[231736], 99.95th=[246416],
| 99.99th=[274727]
bw ( MiB/s): min=21866, max=21866, per=92.76%, avg=21866.27, stdev= 0.00, samples=32
iops : min=43717, max=43717, avg=43717.00, stdev= 0.00, samples=32
lat (usec) : 250=0.01%, 500=3.40%, 750=5.29%, 1000=3.72%
lat (msec) : 2=8.71%, 4=23.35%, 10=3.94%, 20=25.87%, 50=17.88%
lat (msec) : 100=2.57%, 250=5.23%, 500=0.04%
cpu : usr=3.82%, sys=1.21%, ctx=55791, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.5%, 16=1.0%, 32=98.1%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,51200,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
rand_write: (groupid=1, jobs=32): err= 0: pid=1312863: Sat Jun 6 14:00:03 2026
write: IOPS=733k, BW=2863MiB/s (3002MB/s)(83.9GiB/30006msec); 0 zone resets
slat (nsec): min=291, max=384271, avg=1790.10, stdev=1092.37
clat (usec): min=3, max=217186, avg=1395.23, stdev=6022.21
lat (usec): min=60, max=217187, avg=1397.02, stdev=6022.21
clat percentiles (usec):
| 1.00th=[ 104], 5.00th=[ 145], 10.00th=[ 174], 20.00th=[ 219],
| 30.00th=[ 269], 40.00th=[ 363], 50.00th=[ 611], 60.00th=[ 676],
| 70.00th=[ 742], 80.00th=[ 898], 90.00th=[ 1205], 95.00th=[ 1909],
| 99.00th=[ 32375], 99.50th=[ 52167], 99.90th=[ 77071], 99.95th=[ 85459],
| 99.99th=[105382]
bw ( MiB/s): min= 2212, max= 3603, per=100.00%, avg=2865.08, stdev= 9.17, samples=1920
iops : min=566425, max=922543, avg=733457.27, stdev=2346.51, samples=1920
lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.75%
lat (usec) : 250=26.19%, 500=17.38%, 750=26.99%, 1000=11.54%
lat (msec) : 2=12.36%, 4=1.44%, 10=1.34%, 20=0.63%, 50=0.84%
lat (msec) : 100=0.53%, 250=0.01%
cpu : usr=2.34%, sys=9.30%, ctx=17360075, majf=0, minf=314
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,21994226,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
rand_read: (groupid=2, jobs=32): err= 0: pid=1313325: Sat Jun 6 14:00:03 2026
read: IOPS=1019k, BW=3980MiB/s (4173MB/s)(117GiB/30003msec)
slat (nsec): min=240, max=4621.0k, avg=2222.40, stdev=11759.93
clat (usec): min=10, max=25878, avg=1002.14, stdev=668.47
lat (usec): min=67, max=25880, avg=1004.37, stdev=668.49
clat percentiles (usec):
| 1.00th=[ 225], 5.00th=[ 322], 10.00th=[ 396], 20.00th=[ 510],
| 30.00th=[ 619], 40.00th=[ 725], 50.00th=[ 848], 60.00th=[ 988],
| 70.00th=[ 1139], 80.00th=[ 1352], 90.00th=[ 1729], 95.00th=[ 2212],
| 99.00th=[ 3752], 99.50th=[ 4424], 99.90th=[ 5342], 99.95th=[ 5604],
| 99.99th=[ 6521]
bw ( MiB/s): min= 3784, max= 4350, per=100.00%, avg=3981.21, stdev= 2.99, samples=1920
iops : min=968845, max=1113846, avg=1019187.40, stdev=765.14, samples=1920
lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=1.78%, 500=17.43%
lat (usec) : 750=22.75%, 1000=19.07%
lat (msec) : 2=32.30%, 4=5.87%, 10=0.80%, 20=0.01%, 50=0.01%
cpu : usr=4.35%, sys=13.28%, ctx=21597989, majf=0, minf=210
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=30567165,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=23.0GiB/s (24.7GB/s), 23.0GiB/s-23.0GiB/s (24.7GB/s-24.7GB/s), io=25.0GiB (26.8GB), run=1086-1086msec
Run status group 1 (all jobs):
WRITE: bw=2863MiB/s (3002MB/s), 2863MiB/s-2863MiB/s (3002MB/s-3002MB/s), io=83.9GiB (90.1GB), run=30006-30006msec
Run status group 2 (all jobs):
READ: bw=3980MiB/s (4173MB/s), 3980MiB/s-3980MiB/s (4173MB/s-4173MB/s), io=117GiB (125GB), run=30003-30003mseclayout: (g=0): rw=write, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=chimera, iodepth=32
...
rand_write: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=chimera, iodepth=32
...
randread: (g=2): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=chimera, iodepth=32
...
fio-3.40
Starting 96 threads
Jobs: 32 (f=3200): [(64),r(32)][100.0%][r=39.9GiB/s][r=10.5M IOPS][eta 00m:00s]
layout: (groupid=0, jobs=32): err= 0: pid=1308608: Sat Jun 6 13:50:55 2026
write: IOPS=80.8k, BW=39.4GiB/s (42.3GB/s)(25.0GiB/634msec); 0 zone resets
slat (nsec): min=1152, max=145379, avg=15370.22, stdev=8937.85
clat (usec): min=88, max=108608, avg=10248.51, stdev=12388.82
lat (usec): min=92, max=108642, avg=10263.88, stdev=12390.89
clat percentiles (usec):
| 1.00th=[ 103], 5.00th=[ 285], 10.00th=[ 498], 20.00th=[ 922],
| 30.00th=[ 1598], 40.00th=[ 2540], 50.00th=[ 4424], 60.00th=[ 9634],
| 70.00th=[ 16057], 80.00th=[ 19268], 90.00th=[ 23200], 95.00th=[ 29230],
| 99.00th=[ 56886], 99.50th=[ 78119], 99.90th=[ 94897], 99.95th=[ 99091],
| 99.99th=[105382]
bw ( MiB/s): min=32894, max=32894, per=81.47%, avg=32894.62, stdev= 0.00, samples=26
iops : min=65778, max=65778, avg=65778.00, stdev= 0.00, samples=26
lat (usec) : 100=0.73%, 250=3.69%, 500=5.64%, 750=6.00%, 1000=5.16%
lat (msec) : 2=13.87%, 4=13.13%, 10=12.12%, 20=21.64%, 50=16.73%
lat (msec) : 100=1.24%, 250=0.04%
cpu : usr=84.89%, sys=0.46%, ctx=1750, majf=0, minf=6956
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.5%, 16=1.0%, 32=98.1%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,51200,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
rand_write: (groupid=1, jobs=32): err= 0: pid=1309110: Sat Jun 6 13:50:55 2026
write: IOPS=10.8M, BW=41.4GiB/s (44.4GB/s)(1242GiB/30002msec); 0 zone resets
slat (nsec): min=200, max=2557.6k, avg=388.83, stdev=239.00
clat (usec): min=21, max=10978, avg=93.78, stdev=48.33
lat (usec): min=21, max=10978, avg=94.17, stdev=48.33
clat percentiles (usec):
| 1.00th=[ 52], 5.00th=[ 62], 10.00th=[ 67], 20.00th=[ 74],
| 30.00th=[ 79], 40.00th=[ 84], 50.00th=[ 89], 60.00th=[ 94],
| 70.00th=[ 101], 80.00th=[ 110], 90.00th=[ 123], 95.00th=[ 137],
| 99.00th=[ 229], 99.50th=[ 281], 99.90th=[ 416], 99.95th=[ 553],
| 99.99th=[ 906]
bw ( MiB/s): min=40846, max=43462, per=100.00%, avg=42396.65, stdev=17.30, samples=1916
iops : min=10456735, max=11126356, avg=10853537.66, stdev=4428.85, samples=1916
lat (usec) : 50=0.62%, 100=68.65%, 250=29.95%, 500=0.71%, 750=0.04%
lat (usec) : 1000=0.02%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
cpu : usr=98.83%, sys=0.93%, ctx=267082, majf=0, minf=3406
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,325529457,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
rand_read: (groupid=2, jobs=32): err= 0: pid=1309251: Sat Jun 6 13:50:55 2026
read: IOPS=10.5M, BW=40.1GiB/s (43.1GB/s)(1203GiB/30001msec)
slat (nsec): min=180, max=175275, avg=367.51, stdev=205.67
clat (usec): min=9, max=20342, avg=96.64, stdev=44.73
lat (usec): min=9, max=20343, avg=97.01, stdev=44.74
clat percentiles (usec):
| 1.00th=[ 65], 5.00th=[ 70], 10.00th=[ 74], 20.00th=[ 79],
| 30.00th=[ 87], 40.00th=[ 92], 50.00th=[ 95], 60.00th=[ 98],
| 70.00th=[ 101], 80.00th=[ 106], 90.00th=[ 116], 95.00th=[ 128],
| 99.00th=[ 206], 99.50th=[ 260], 99.90th=[ 383], 99.95th=[ 478],
| 99.99th=[ 799]
bw ( MiB/s): min=40011, max=41733, per=100.00%, avg=41136.92, stdev=11.23, samples=1888
iops : min=10262178, max=10703686, avg=10550706.22, stdev=2880.78, samples=1888
lat (usec) : 10=0.01%, 20=0.07%, 50=0.27%, 100=66.39%, 250=32.71%
lat (usec) : 500=0.52%, 750=0.03%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=99.61%, sys=0.28%, ctx=840, majf=0, minf=204
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=316039490,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=39.4GiB/s (42.3GB/s), 39.4GiB/s-39.4GiB/s (42.3GB/s-42.3GB/s), io=25.0GiB (26.8GB), run=634-634msec
Run status group 1 (all jobs):
WRITE: bw=41.4GiB/s (44.4GB/s), 41.4GiB/s-41.4GiB/s (44.4GB/s-44.4GB/s), io=1242GiB (1333GB), run=30002-30002msec
Run status group 2 (all jobs):
READ: bw=40.1GiB/s (43.1GB/s), 40.1GiB/s-40.1GiB/s (43.1GB/s-43.1GB/s), io=1203GiB (1292GB), run=30001-30001msectop
Client-side top during each run. Under the Linux mount the CPU is spread
across nfsiod / rpciod kworkers; under Chimera it is a single
fio process busy-polling, with no kernel I/O threads. Headers and busy rows
are verbatim; the long tail of idle kernel threads is elided, as marked.
top - 14:05:16 up 4 days, 22:21, 3 users, load average: 5.84, 13.62, 15.86
Tasks: 3892 total, 47 running, 3845 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 13.8 sy, 0.0 ni, 79.0 id, 7.1 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 386429.6 total, 231825.9 free, 148122.5 used, 9226.2 buff/cache
MiB Swap: 8192.0 total, 8153.7 free, 38.3 used. 238307.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1315100 root 20 0 2617360 44360 33984 S 1344 0.0 6:27.45 fio
1312849 root 20 0 0 0 0 R 32.0 0.0 0:19.89 kworker/u1551:12+rpciod
1315946 root 20 0 0 0 0 R 32.0 0.0 0:01.10 kworker/u1551:0+nfsiod
1316194 root 20 0 0 0 0 I 32.0 0.0 0:00.91 kworker/u1551:14-nfsiod
1316342 root 20 0 0 0 0 I 32.0 0.0 0:00.70 kworker/u1551:21-nfsiod
1311934 root 20 0 0 0 0 I 28.0 0.0 0:08.12 kworker/u1553:9-nfsiod
1313121 root 20 0 0 0 0 R 28.0 0.0 0:19.54 kworker/u1551:28+nfsiod
1313739 root 20 0 0 0 0 I 28.0 0.0 0:06.67 kworker/u1551:48-nfsiod
1315669 root 20 0 0 0 0 I 28.0 0.0 0:01.13 kworker/u1553:1-nfsiod
1316086 root 20 0 0 0 0 I 28.0 0.0 0:00.80 kworker/u1551:2-nfsiod
1316090 root 20 0 0 0 0 I 28.0 0.0 0:01.03 kworker/u1551:3-nfsiod
1316132 root 20 0 0 0 0 R 28.0 0.0 0:01.02 kworker/u1553:6+rpciod
1316137 root 20 0 0 0 0 I 28.0 0.0 0:01.05 kworker/u1551:5-nfsiod
1316185 root 20 0 0 0 0 I 28.0 0.0 0:00.78 kworker/u1551:10-nfsiod
1316323 root 20 0 0 0 0 I 28.0 0.0 0:00.93 kworker/u1551:17-nfsiod
1316327 root 20 0 0 0 0 I 28.0 0.0 0:00.95 kworker/u1551:18-nfsiod
955127 root 20 0 0 0 0 I 24.0 0.0 0:08.66 kworker/u1553:2-nfsiod
1283422 root 20 0 0 0 0 I 24.0 0.0 0:07.86 kworker/u1553:0-nfsiod
1313431 root 20 0 0 0 0 I 24.0 0.0 0:14.90 kworker/u1549:32-nfsiod
... (additional nfsiod / rpciod kworkers at 20-24% CPU elided) ...
%Cpu summary: 0.2% user, 13.8% system, 7.1% iowait — data movement is spread across
dozens of kernel nfsiod / rpciod worker threads rather than in fio itself.top - 14:05:54 up 4 days, 22:21, 3 users, load average: 19.64, 16.43, 16.73
Tasks: 4006 total, 1 running, 4005 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.3 us, 0.1 sy, 0.0 ni, 91.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 386429.6 total, 231614.9 free, 148333.2 used, 9225.2 buff/cache
MiB Swap: 8192.0 total, 8153.7 free, 38.3 used. 238096.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1316608 root 20 0 81.8g 527512 44624 S 3128 0.1 9:34.47 fio
1314410 root 20 0 16608 11672 4108 R 20.0 0.0 0:09.33 top
1 root 20 0 25604 16452 11068 S 0.0 0.0 0:15.94 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:03.01 kthreadd
... (all remaining tasks are idle kernel threads at 0.0% CPU) ...
%Cpu summary: 8.3% user, 0.1% system, 0% iowait — essentially all the work is the
single fio process busy-polling RDMA at 3128% CPU (~31 cores). No kernel I/O threads.