HEADS UP - Final HAMMER on-disk structural changes being made today

Matthew Dillon dillon at apollo.backplane.com
Mon May 5 23:41:30 PDT 2008


:Matt,
:
:Thanks for your insight.
:...
:That said, I wouldn't advocate any change targeting the performance of
:NAND-based disks unless it would improve performance also on magnetic media
:as well as any future flash-like/solid-state media. SSD's are finally at a
:point where they are moving into the enterprise* to support read query heavy
:databases and the like, but it would be sheer folly to think of NAND as the
:endgame in that department.
:
:*Mobile too, I guess, MacBook Air and Lenovo x300

    I don't think its possible to have a filesystem that can target both
    equally well.  Locality of reference for traditional disk media is
    very different from the random access nature of flash.

    I do think its possible to improve the performance of a filesystem
    based around the concept of a hard drive on flash media, and some
    filesystems will be more amendable then others.

:Don't most filesystem implementations call their DO log a "Journal" or so?
::)

    Sure, you can call it a journal, a two-way journal, a reversable
    journal, etc.  They're fairly generic terms that don't say much
    about what is actually being written to the disk, other then whatever
    it is is going to be written linearly.

:I assume that laying out DO records would be exceedingly similar to UNDO
:record logging, most of the code being common. (I'm adding a review of those
:bits of HAMMER to my todo list), and that the added complexity would lie in
:assembling/ collating/coalescing/etc. the metadata bits in memory, but more
:than that, intelligently flushing those blocks to disk (expediently enough
:to avoid creating too much memory pressure) and, of course, the recovery
:code as you can't just assume your metadata is proper anymore. Are there
:potentially portions of the filesystem (or future features) that DO logging
:would simplify? Couldn't a DO log effectively be NOP'd (instead of bounded)
:for low-memory situations?

    The log is required to ensure that the filesystem can recover after
    a crash, but it doesn't have to be very large.  The filesystem flushes
    would simply be less efficient in a low memory situation.

:I don't know if it should be pointed out or not, but seemingly the biggest
:problem(s) with ZFS in practice all boil down to excess kernel memory
:pressure related to the ARC cache. Especially on write-heavy workloads.
:
:Sam 

    HAMMER doesn't use huge blocks like that.  The main pressure on memory
    for HAMMER is file data in the buffer cache having to be shoved over
    to the backend flusher.  I should have that fixed before release... since
    file data doesn't have to be logged I can shove it out to the media
    directly as long as I synchronize the flush of the inode's meta-data
    with the meta-data involved in allocating the file blocks.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>






More information about the Kernel mailing list