HEADS UP - HAMMER work

Matthew Dillon dillon at apollo.backplane.com
Thu Nov 13 10:02:25 PST 2008


:Do you want to share it for the technically interested, if just to witness how file system development is working? :)
:
:cheers
:  simon

    Sigh.  OK, fine.  Here it is.  This is subject to change.  The whole
    purpose of doing a semi-sorted directory hash encoding is that many
    standard system functions, like ls -la, sort the contents of the
    directory.  For very large directories (e.g. hundreds of thousands of
    files that blow out system caches) if there is no locality of reference
    for sorted lookups and each lookup will essentially have to do a random
    seek.  In HAMMER these are still direct seeks since we have the B-Tree.
    (In UFS the operation devolves down to O(N^3) because the entire directory
    must be scanned for each lookup if the caches are blown out).

    With a semi-sorted hash key the random seeks HAMMER has to do can be
    reduced by up to a factor of 100.

    64-bit directory hash encoding (for smaller filenames out of bounds
    indices just store a 0).

    aaaaa	name[0] & 0x1F
    bbbbb	name[1] & 0x1F
    ccccc	name[2] & 0x1F
    mmmmmm	crc32(name + 3, len - 5) and some xor magic -> 6 bits
    yyyyy	name[len-2] & 0x1F
    zzzzz	name[len-1] & 0x1F
    h[31]	crc32(name, len) (entire filename)

    0aaaaabbbbbccccc mmmmmmyyyyyzzzzz hhhhhhhhhhhhhhhh hhhhhhhhhhhhhhh0

    lsb bit is 0 to provide an iteration space.  The iteration limit is set
    to 2^24 (for dealing with collisions).  That means the absolute guarantee
    is 16 million files per directory but, realisitically, one could probably
    put in excess of 2 billion files in a directory, or more, before
    hitting the iteration limit.

    * semi-sorted hash key, folded ascii domains (aka just mask the low 5
      bits).  lower case, upper case, digits, and punctuation are folded.
      For first three characters and last two characters.

    * middle chars are broken down into 64 'random' domains.

    * last two chars are semi-sorted so in a very large directory you have
      2-chars worth of linearity before having to do an unlocalized seek.
      (e.g. ~1-676 files in each seek group), when doing sorted lookups.

    * m[6] and h[31] serve to hash the lookup.  A direct lookup will probably
      hit the filename immediately.

    * Otherwise we iterate the B-Tree through a 2^24 space to determine if
      the file exists or not.  This space is typically sparse, so usually
      the iteration is just one or two B-Tree elements.

      However, in very large directories one will start to see overlap,
      requiring the iteration to scan several B-Tree elements.  The m[6]
      and the top 8 'h' bits serve to break the search space down to reduce
      the number of records that have to be iterated (the low 23 'h' bits
      overlap the iteration space).

    * It should be noted that looking up a file that exists is faster then
      looking up a file that does not exist in the case where the directory
      contains a large number (i.e. millions) of files.

    In anycase, so far it seems to work fairly well but I still have a lot
    of testing to do, including actuallying creating a directory with millions
    of files in it and doing test lookups.

    There are several issues but most of the trade-offs seem to be ok.  The
    one issue I am looking at closely is perhaps to try to do a better job
    sorting files that have both a prefix and a postfix.  In the above scheme
    any large prefix or postfix covers the a[5], b[5], c[5], y[5], and z[5]
    bits and devolves us back to an effectively random hash code.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list