buf01.patch - BUF/BIO patch #1 (alpha quality)
Matthew Dillon
dillon at apollo.backplane.com
Sun Feb 5 16:42:15 PST 2006
Here is the first iteration. There are virtually guarenteed to be bugs,
but the more help I have in finding them faster I'll be able to work.
WARNING! Any bugs have the potential to corruption the target
filesystem, only test this patch on setups where you can afford to lose
your filesystems.
fetch http://apollo.backplane.com/DFlyMisc/buf01.patch
I have done some basic testing of UFS, NFS, and CD's. I have done no
testing of any of the other filesystems and I have not yet done any serious
testing to check for e.g. filesystem corruption.
I need help life-testing UFS, NFS, CD's, and any help at all testing
other filesystems, preferably with some debugging help too. This is
a fairly massive patch and its only stage #1 !! In particular: msdosfs,
ntfs, hpfs, mfs, and the vn device.
This patch changes all vnode and device strategy calls to take a BIO
instead of a BUF, and changes biodone() to take a BIO. The logical, disk,
and physical block numbers and offsets are no longer in the BUF structure
but instead each layer gets its own BIO structure. Various other
I/O specific fields, such as b_iodone, have also been moved to the BIO
structure.
As an I/O progresses, each translation layer 'pushes' a new BIO. For
the moment, all the BIO's are simply embedded as an array in the BUF.
Once an I/O finishes, the chain of BIOs is left intact and serve to
'cache' block translations (similar to how b_blkno cached block
translations before the patch).
Filesystems which cache block number translations can make certain
assumptions when they are holding the primary buffer for a data block.
e.g. UFS might do a 'bp = getblk(...)' where bp is a returned struct buf.
Under these conditions bp->b_bio1 (aka bp->b_bio_array[0]) contains the
logical block and offset information for the buffer, and bp->b_bio2
contains the disk or physical layer ... the 'cached translation' which is
integral to how the buffer cache works and determines whether I/O
operations have to execute a BMAP call or not to translate the logical
block numbers into disk or physical block numbers.
Once a vnode or device strategy call is made, however, EVERYTHING is
BIO centric. The block number that that vnode or device uses is always
bio->bio_blkno, the block number in the BIO that was passed to the
vnode or device's strategy routine, and NEVER anything relative to the
struct buf that the BIO references.
Another change I have made is to begin to divorce the bp->b_vp from the
vnode and device strategy routines. Before this change vnode based
strategy routines always assumed that bp->b_vp referenced the vnode
that strategy routine was executed on, and device strategy routines
always assumed that bp->b_dev referenced the device the device strategy
routine was executed on. This is no longer true. The vnode or device
strategy routine must use the vnode or device pointer passed to it for
that reference. This allowed me to entirely remove bp->b_dev and to
divorce the original buffer from the actual vnode the I/O winds up
occuring on, which in turn will allow us to run I/O through a stack
of vnodes and devices without having to hack the struct buf to pieces.
This patch represents stage 1 work. The BIO contains a bio_offset and
a bio_blkno. In Stage 2, bio_blkno will be removed entirely and there
will only be bio_offset. Stage 2 is going to be considerably more
dangerous the stage 1 so I really want to get stage 1 completely solid
and committed before I begin working on stage 2.
-Matt
More information about the Kernel
mailing list