HEADS UP - HAMMER work

Matthew Dillon dillon at apollo.backplane.com
Sat Nov 15 10:55:01 PST 2008


:It might be a good idea to make a small survey, i.e. find
:people who actually _do_ have directories with a huge
:number of files in them (and I mean more than just a few
:thousands), and ask them what the filenames typically look
:like.

    That is a very good idea.

:An obvious improvement would be to store name[d-2] and
:name[d-1] in y[] and z[], respectively, where d is the
:location of the last dot in the filename, if any, or the
:location of the terminating zero if there is no dot.
:In other words:  Ignore the extension when identifying
:y[] and z[].  Finding the last dot shouldn't be more
:computationally expensive than strlen(name), so this
:shouldn't be a problem.
:
:Best regards
:   Oliver

    Another thing I was thinking about was dividing the filename
    into four zones, and CRCing each zone.

    The zones could be based on dashes and dots, and secondarily on
    alpha-numeric transitions. If there are fewer then four zones
    we would simply cut the pieces we do have down the middle, or into
    quarters.  If there are more then four zones we would combine two
    or more zones together to fit.

    Here is an off-the-cuff structure:  Four zones, each zone CRC'd,
    laid out using 16 bit CRC's for each zone ('d' is 15 bits so we
    can set the LSB bit to zero to guarantee the iteration space).

    aaaaaaaabbbbbbbb ccccccccdddddddd aaaaaaaabbbbbbbb ccccccccddddddd0

    The problem with the zone idea is that it might not work too well
    if the filenames have varying lengths... though now that I think about
    it if the filename is otherwise unstructured (no dots, dashes, etc),
    we could restrict zone A to the first 2-3 chars and zone D to the last
    2-3 chars, and use zone's B and C to split everything left in the middle.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list