HEADS UP - HAMMER work
Matthew Dillon
dillon at apollo.backplane.com
Thu Nov 13 10:02:25 PST 2008
:Do you want to share it for the technically interested, if just to witness how file system development is working? :)
:
:cheers
: simon
Sigh. OK, fine. Here it is. This is subject to change. The whole
purpose of doing a semi-sorted directory hash encoding is that many
standard system functions, like ls -la, sort the contents of the
directory. For very large directories (e.g. hundreds of thousands of
files that blow out system caches) if there is no locality of reference
for sorted lookups and each lookup will essentially have to do a random
seek. In HAMMER these are still direct seeks since we have the B-Tree.
(In UFS the operation devolves down to O(N^3) because the entire directory
must be scanned for each lookup if the caches are blown out).
With a semi-sorted hash key the random seeks HAMMER has to do can be
reduced by up to a factor of 100.
64-bit directory hash encoding (for smaller filenames out of bounds
indices just store a 0).
aaaaa name[0] & 0x1F
bbbbb name[1] & 0x1F
ccccc name[2] & 0x1F
mmmmmm crc32(name + 3, len - 5) and some xor magic -> 6 bits
yyyyy name[len-2] & 0x1F
zzzzz name[len-1] & 0x1F
h[31] crc32(name, len) (entire filename)
0aaaaabbbbbccccc mmmmmmyyyyyzzzzz hhhhhhhhhhhhhhhh hhhhhhhhhhhhhhh0
lsb bit is 0 to provide an iteration space. The iteration limit is set
to 2^24 (for dealing with collisions). That means the absolute guarantee
is 16 million files per directory but, realisitically, one could probably
put in excess of 2 billion files in a directory, or more, before
hitting the iteration limit.
* semi-sorted hash key, folded ascii domains (aka just mask the low 5
bits). lower case, upper case, digits, and punctuation are folded.
For first three characters and last two characters.
* middle chars are broken down into 64 'random' domains.
* last two chars are semi-sorted so in a very large directory you have
2-chars worth of linearity before having to do an unlocalized seek.
(e.g. ~1-676 files in each seek group), when doing sorted lookups.
* m[6] and h[31] serve to hash the lookup. A direct lookup will probably
hit the filename immediately.
* Otherwise we iterate the B-Tree through a 2^24 space to determine if
the file exists or not. This space is typically sparse, so usually
the iteration is just one or two B-Tree elements.
However, in very large directories one will start to see overlap,
requiring the iteration to scan several B-Tree elements. The m[6]
and the top 8 'h' bits serve to break the search space down to reduce
the number of records that have to be iterated (the low 23 'h' bits
overlap the iteration space).
* It should be noted that looking up a file that exists is faster then
looking up a file that does not exist in the case where the directory
contains a large number (i.e. millions) of files.
In anycase, so far it seems to work fairly well but I still have a lot
of testing to do, including actuallying creating a directory with millions
of files in it and doing test lookups.
There are several issues but most of the trade-offs seem to be ok. The
one issue I am looking at closely is perhaps to try to do a better job
sorting files that have both a prefix and a postfix. In the above scheme
any large prefix or postfix covers the a[5], b[5], c[5], y[5], and z[5]
bits and devolves us back to an effectively random hash code.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list