HEADS UP - HAMMER work
Dennis Melentyev
dennis.melentyev at gmail.com
Sat Nov 15 11:25:32 PST 2008
Hi Matt,
2008/11/15 Matthew Dillon <dillon at apollo.backplane.com>:
>
> :It might be a good idea to make a small survey, i.e. find
> :people who actually _do_ have directories with a huge
> :number of files in them (and I mean more than just a few
> :thousands), and ask them what the filenames typically look
> :like.
>
> That is a very good idea.
>
> :An obvious improvement would be to store name[d-2] and
> :name[d-1] in y[] and z[], respectively, where d is the
> :location of the last dot in the filename, if any, or the
> :location of the terminating zero if there is no dot.
> :In other words: Ignore the extension when identifying
> :y[] and z[]. Finding the last dot shouldn't be more
> :computationally expensive than strlen(name), so this
> :shouldn't be a problem.
> :
> :Best regards
> : Oliver
>
> Another thing I was thinking about was dividing the filename
> into four zones, and CRCing each zone.
>
> The zones could be based on dashes and dots, and secondarily on
> alpha-numeric transitions. If there are fewer then four zones
> we would simply cut the pieces we do have down the middle, or into
> quarters. If there are more then four zones we would combine two
> or more zones together to fit.
>
> Here is an off-the-cuff structure: Four zones, each zone CRC'd,
> laid out using 16 bit CRC's for each zone ('d' is 15 bits so we
> can set the LSB bit to zero to guarantee the iteration space).
>
> aaaaaaaabbbbbbbb ccccccccdddddddd aaaaaaaabbbbbbbb ccccccccddddddd0
>
> The problem with the zone idea is that it might not work too well
> if the filenames have varying lengths... though now that I think about
> it if the filename is otherwise unstructured (no dots, dashes, etc),
> we could restrict zone A to the first 2-3 chars and zone D to the last
> 2-3 chars, and use zone's B and C to split everything left in the middle.
>
Please, think of it being tunable some way. In no dobt you have a huge
experience, but I'm not sure you can guess every possible situation
and this could be left for administrator, who really knows what do he
need in every particular case.
--
Dennis Melentyev
More information about the Kernel
mailing list