Initial NVME driver for DragonFly committed to master

Matthew Dillon dillon at backplane.com
Tue Jun 7 21:32:34 PDT 2016


Further work on the nvme driver and more performance tests.  I got a couple
more cards today but it turns out only one was actually a NVME card (the
Intel 750).  The card itself is pretty horrible compared to the samsungs...
just a really bad firmware implementation and it slows down drastically
when large blocks (64K and 128K) are used, or if a lot of queues are
allocated, or if commands are queued to multiple queues simultaneously.  It
takes a lot to actually get it to perform well.

So I definitely recommend the Samsungs over the Intel 750, at least for now.

The nvme(4) manual page has been updated with some information on BIOS
configuration and brands.  Really there are only two readily available at a
reasonable price... Samsung and Intel. And at the moment Samsung has far
better firmware.  The rebrands I bought turned out to not be NVME cards
(they were M.2 cards with an integrated AHCI controller).  The Plextor did
horribly.  The Kingston was a bit better.  But I would not recommend either.

Note that most BIOSes cannot boot from NVME cards and if they can its
probably UEFI, which is a pain to setup for DragonFly.

In anycase, I put up some new stats in sys02.txt below:

http://apollo.backplane.com/DFlyMisc/nvme_sys01.txt

http://apollo.backplane.com/DFlyMisc/nvme_sys02.txt

The sys02.txt tests run all three cards simultaneously.  Generally speaking
I maxed out at around 535,000 IPOS in the 512 byte random seek test and the
4096 byte random seek test, and I maxed out at around 4.5 GBytes/sec
reading on the bandwidth test using 32KB buffers (out of deference for the
idiotic intel firmware).  Also, just for the fun of it, at the end I threw
4 SSDs into the hot-swap bays and ran tests with them + the nvme cards all
together.  But aggregate bandwidth did not improve and aggregate IOPS
dropped slightly :-).

The tests were performed on a 3.4 GHz Haswell xeon, 4-core/8-thread, with
16GB of ram.  The data sets were written using /dev/urandom as a source
(i.e. meant to be uncompressable).

These tests bring up some interesting problems that I'll have to solve for
HAMMER and HAMMER2.  These filesystems crc the meta-data and data blocks.
The generic crc code can only do around 200 MBytes/sec per cpu core and the
multi-table iscsi_crc code can only do 500 MBytes/sec per cpu core.  That's
a problem when the underlying storage has 1.5 GBytes/sec of bandwidth.

-Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/users/attachments/20160607/83536518/attachment.html>


More information about the Users mailing list