HEADS UP - HAMMER work

Matthew Dillon dillon at apollo.backplane.com
Thu Nov 13 11:08:26 PST 2008


:I think while optimizing this is a worthwhile goal, I wonder who is doing sorted lookups in the first place?  Shouldn't programs do loop{readdir(); stat();} sort;  instead of loop{readdir();}; sort; loop{stat();}?  I'm thinking of NFS shares using the readdirplus NFS RPC - they would feel really unhappy if people wouldn't stat in the right order?

    You are making an assumption that the program's operation is integrated
    into the directory scan, which generally isn't true.  Any script, for
    example, will sort file listings because the shell sorts file listings
    from wildcards automatically.

    Even a program like ls is going to sort the filenames before it stats
    them.

    The 'fts' API (man fts) includes an integrated sorting mechanism, so
    most utilities using fts will generally sort the filenames before acting
    upon them.  Utilities also do this because output is more readable when
    sorted.

    The main issue here is that if accesses are going to tend to be in
    sorted order we want to reduce the number of seeks required to re-lookup
    the files for the case where the system caches have been blown.  If
    accesses are in scan order then seeks are reduced regardless.  It's 
    nice for both to be approximately the same.

    Archivers like tar (used to) sort file names.  For some reason our
    current tar doesn't seem to, though.  cpdup doesn't (I just use readdir()
    directly).  Any wildcarding will.  Any ls output will by default.
    Numerous programs may lay down or process files in sorted order.  A
    mail program, for example.

:>     lsb bit is 0 to provide an iteration space.  The iteration limit is set
:>     to 2^24 (for dealing with collisions).  That means the absolute guarantee
:>     is 16 million files per directory but, realisitically, one could probably
:>     put in excess of 2 billion files in a directory, or more, before
:>     hitting the iteration limit.
:
:What do you mean with "iteration limit"?  I somehow feel uncomfortable arbitrarily limiting the number of files in a directory based on statistical multiplexing.  Because in this case we're not dealing with slow lookups, but sort of unreasonable EEXISTs in the case of hash collisions.  Maybe I'm not understanding correctly, though.
:
:cheers
:  simon

    Any hash-driven directory key can have collisions.  Hash collisions
    clearly cannot prevent the file from being created so when a collision
    occurs an alternative set of hash values must be used to locate a
    non-colliding key and store the file entry there.

    The iteration space is the set of alternative hash values.

    When determining whether a file exists or not the iteration space must
    be scanned until the file is located or the iteration space has been
    exhausted.  This is not a for() loop per-say, it is actually a B-Tree
    iteration and since the B-Tree tends to be sparse the actual number
    of elements that need to be scanned are typically few (or none).

    Because the hashed elements in the areas of the key that are not part
    of the iteration space tend to be randomized, the actual number of
    files one can store in a directory are far more then the size of
    the iteration space.  However, the iteration space *DOES* give us the
    absolute worst-case minimum number of files we guarantee can be stored
    in the directory.

    (It should be noted that UFS can't store more then a few hundred thousand
    files in a single directory before system caches are blown and access
    times become O(N^3)).

    This is a consideration.  I might have to change or remove some of the
    5-bit sorting keys to make the iteration space larger, for example.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list