Initial filesystem design synopsis - indexing
Matthew Dillon
dillon at apollo.backplane.com
Mon Feb 26 09:24:53 PST 2007
I have fleshed out a bit more of the spec. One of the areas I left
hanging was how to index the records in a segment.
I have decided to use a B+Tree which is allocated on the fly out of
a segment's data space, rather then try to store the index in
ephermal 'dead' space between the segment's data space and the
segment's record array. That is, the space within the segment used for
the index would be deterministically reserved right off the bat.
The advantage of using a B+Tree is that the number of nodes is a known
quantity, easily calculated and reserved as records are allocated.
B+Trees are very condusive to efficient I/O access patterns on-disk.
A B+Tree is basically a BTree but where record references exist in the
nodes in addition to the leafs. B/B+Trees use a radix-N structure.
Each node contains up to N record references (rather then just two as
you would have in a binary tree). The structure is very condusive to
disk I/O in that accesses and updates to the index require only a
limited number of disk I/O's to accomplish pretty much regardless of
size.
My intention is to use a radix of either 8 or 16, depending on how the
data structures work out. Relative large radixes reduce the number of
discrete I/Os required to access an index (the tradeoff is against
the almost linear search within the radix required to locate the
actual record or recursion point). Large numbers of indexed records
are basically I/O bound entities. I had very good luck using this
sort of structure in the Backplane Database so I am fairly confident
that it will scale even unto directories with millions of files in
them.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list