Initial filesystem design synopsis - indexing

Matthew Dillon dillon at apollo.backplane.com
Mon Feb 26 09:24:53 PST 2007


    I have fleshed out a bit more of the spec.  One of the areas I left
    hanging was how to index the records in a segment.

    I have decided to use a B+Tree which is allocated on the fly out of
    a segment's data space, rather then try to store the index in 
    ephermal 'dead' space between the segment's data space and the
    segment's record array.  That is, the space within the segment used for
    the index would be deterministically reserved right off the bat.

    The advantage of using a B+Tree is that the number of nodes is a known
    quantity, easily calculated and reserved as records are allocated.
    B+Trees are very condusive to efficient I/O access patterns on-disk.

    A B+Tree is basically a BTree but where record references exist in the
    nodes in addition to the leafs.   B/B+Trees use a radix-N structure.
    Each node contains up to N record references (rather then just two as
    you would have in a binary tree).  The structure is very condusive to
    disk I/O in that accesses and updates to the index require only a
    limited number of disk I/O's to accomplish pretty much regardless of 
    size.

    My intention is to use a radix of either 8 or 16, depending on how the
    data structures work out.  Relative large radixes reduce the number of
    discrete I/Os required to access an index (the tradeoff is against
    the almost linear search within the radix required to locate the
    actual record or recursion point).  Large numbers of indexed records
    are basically I/O bound entities.  I had very good luck using this
    sort of structure in the Backplane Database so I am fairly confident
    that it will scale even unto directories with millions of files in
    them.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list