buf01.patch - BUF/BIO patch #1 (alpha quality)

Matthew Dillon dillon at apollo.backplane.com
Sun Feb 5 16:42:15 PST 2006


    Here is the first iteration.  There are virtually guarenteed to be bugs,
    but the more help I have in finding them faster I'll be able to work.

    WARNING!  Any bugs have the potential to corruption the target
    filesystem, only test this patch on setups where you can afford to lose
    your filesystems.

	fetch http://apollo.backplane.com/DFlyMisc/buf01.patch

    I have done some basic testing of UFS, NFS, and CD's.  I have done no
    testing of any of the other filesystems and I have not yet done any serious
    testing to check for e.g. filesystem corruption.

    I need help life-testing UFS, NFS, CD's, and any help at all testing 
    other filesystems, preferably with some debugging help too.  This is
    a fairly massive patch and its only stage #1 !!  In particular: msdosfs,
    ntfs, hpfs, mfs, and the vn device.

    This patch changes all vnode and device strategy calls to take a BIO
    instead of a BUF, and changes biodone() to take a BIO.  The logical, disk,
    and physical block numbers and offsets are no longer in the BUF structure
    but instead each layer gets its own BIO structure.  Various other 
    I/O specific fields, such as b_iodone, have also been moved to the BIO
    structure.

    As an I/O progresses, each translation layer 'pushes' a new BIO.  For
    the moment, all the BIO's are simply embedded as an array in the BUF.
    Once an I/O finishes, the chain of BIOs is left intact and serve to
    'cache' block translations (similar to how b_blkno cached block
    translations before the patch).

    Filesystems which cache block number translations can make certain 
    assumptions when they are holding the primary buffer for a data block.
    e.g. UFS might do a 'bp = getblk(...)' where bp is a returned struct buf.
    Under these conditions bp->b_bio1 (aka bp->b_bio_array[0]) contains the
    logical block and offset information for the buffer, and bp->b_bio2
    contains the disk or physical layer ... the 'cached translation' which is
    integral to how the buffer cache works and determines whether I/O 
    operations have to execute a BMAP call or not to translate the logical
    block numbers into disk or physical block numbers.

    Once a vnode or device strategy call is made, however, EVERYTHING is
    BIO centric.  The block number that that vnode or device uses is always
    bio->bio_blkno, the block number in the BIO that was passed to the
    vnode or device's strategy routine, and NEVER anything relative to the
    struct buf that the BIO references.

    Another change I have made is to begin to divorce the bp->b_vp from the
    vnode and device strategy routines.  Before this change vnode based
    strategy routines always assumed that bp->b_vp referenced the vnode 
    that strategy routine was executed on, and device strategy routines
    always assumed that bp->b_dev referenced the device the device strategy
    routine was executed on.  This is no longer true.  The vnode or device
    strategy routine must use the vnode or device pointer passed to it for
    that reference.  This allowed me to entirely remove bp->b_dev and to
    divorce the original buffer from the actual vnode the I/O winds up 
    occuring on, which in turn will allow us to run I/O through a stack
    of vnodes and devices without having to hack the struct buf to pieces.

    This patch represents stage 1 work.  The BIO contains a bio_offset and
    a bio_blkno.  In Stage 2, bio_blkno will be removed entirely and there
    will only be bio_offset.  Stage 2 is going to be considerably more 
    dangerous the stage 1 so I really want to get stage 1 completely solid
    and committed before I begin working on stage 2.

						-Matt






More information about the Kernel mailing list