NVMe performance improvements in master
Matthew Dillon
dillon at apollo.backplane.com
Sat Jul 16 23:47:45 PDT 2016
I've made significant progress on NVMe performance. On a brand-new
server (2 x Xeon 2620-v4, 16-core/32-thread, 128GB ram), with PCIe-3
slots, testing two Samsung and one Intel NVMe card, I was able to
achieve 931227 IOPS+ with highly parallelized 4K random reads from
a urandom-filled partition (i.e. no compression, no dummy I/O full of
zeros). And the system is 75% idle while its running.
>>> yes, you heard me, that's 931K IOPS <<<
I've compiled some before-and-after statistics here:
http://apollo.backplane.com/DFlyMisc/nvme_sys03.txt
Progress has been made in the pbuf subsystem (used by physio), and the
MMU page invalidation subsystem. Additional work will be needed to
achieve these results through a filesystem. The remaining roadblocks
for getting this stupendously huge level of performance through our
filesystems are as follows:
(1) Filesystem data check, de-duplication, and compression overheads.
(2) Kernel_pmap updates requiring SMP invalidations (an IPI to all cpus).
(3) Lock contention in the filesystem and buffer cache path.
(4) Hardware-level cache coherency load from atomic ops.
Though, in fact, the filesystem will generally not be doing 4K I/Os.
Most of these roadblocks, all except #(1), drop away with 32K and 64K
I/Os.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Users
mailing list