kernel work week of 3-Feb-2010 HEADS UP

Matthew Dillon dillon at apollo.backplane.com
Wed Feb 3 15:20:57 PST 2010


:	SATA SSDs usually have fairly large erase blocks which once all the
:blocks have been touched reduces write performance a lot (often it becomes
:worse than hard disc write performance), PCI SSDs are apparently better in
:this respect but I've yet to see any in the flesh.

    I think major improvements have been made in SSD firmware in the last
    year.  I got a couple of Intel 40G SSDs (the latest ones but on the
    low-end of the capacity scale) and I also got one of the 120G OCZ
    Colossus's which has 128M of ram cache and is advertised as having
    200+ MB/sec of read bandwidth.  Clearly there are going to be a lot
    of choices and SATA SSDs are commodity hardware now.  The
    higher-performing devices, such as a direct PCIe attachment, exist
    more for commercial database uses and not so much for generic 
    server operations due to their premium pricing.

    Of course, we have to remember here that when serving large data sets
    from a normal HD, read performance is dismal... usually less then 
    10MB/sec per physical drive and often worse due to seeking.  Even
    older SSDs would beat the hell out of a modern HD in that regard.

    I'll have a better idea once I am able to start testing with these
    babies.

    Swap space will be reading and writing 4K or larger chunks as well,
    not 512-byte itsy-bitsy chunks, and I expect that will make a
    difference.  For clean data caching we can easily cluster the backing
    store to the buffer cache buffer size which is 8K or 16K for UFS
    and 16K for HAMMER.

:	SSGs still have fairly low lifespans in terms of writes per cell,
:it seems to me that using them as caches would tend to hit that limit
:faster than most other uses.
:
:-- 
:Steve O'Hara-Smith                          |   Directable Mirror Arrays

    Yes, there are definitely management issues but wear life scales
    linearly with the amount of SSD storage (in terms of how much data
    you can write) and it is fairly easy to calculate.

    A cheap Intel 40G with 10,000 cycles would have a wear life of 400TB.
    That's 100GB/day for 10 years, or 1.2MB/sec continuously for 10 years.

    I'm just going by the Wiki.  The Wiki says MLC's can do 1,000 to
    10,000 per cell.  So if it's 1,000 that would be 10GB/day for a 40G
    SSD.

    For a larger SSD the total write capability scales linearly to size
    since there is more flash to work with.

    The main issue from an implementation standpoint is to try to burst
    in a bunch of data, potentially exceeding 10GB/day, but then back-off
    and reduce the allowed write bandwidth as a means of trying to 'track'
    the working data set without wearing the drive out too quickly.  You
    would want to limit yourself to e.g. 10GB/day (or whatever) as an
    average over a period of a week.

    I'd say it is already cost effective if we can eek out 10 years of life
    out of a 40G SSD in a server environment.  I don't think there would be an
    issue at all for meta-data caching, and with reasonable bandwidth and
    bursting limits (for writing) I don't think there would be an issue
    for bulk-data caching either.

    And since the drive would just be used for swap space it is effectively
    a junk drive which can be replaced at any time with no effort.  We could
    even do it hot-swap if we implement swapoff (though for now that isn't
    in the cards).

    One of things I will do when I get these SSDs in and get the basics
    working is intentionally wear out one of the Intel 40G drives to see
    how long it can actually go.  That will be fun :-)

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list