Idle question about multi-core processors

Matthew Dillon dillon at apollo.backplane.com
Fri Dec 1 16:24:00 PST 2006


:I tend to agree with this sentiment.  BGL aside though, I think that
:the VFS is still largely serial, and filesystems will remain an issue.
: That probably is not a week long project.  From my understanding, it
:would probably be easier to do the ZFS port than to untangle the mess
:that is FFS.
:
:Chris

    I do have a plan for FFS.  I plan to touch it as little as possible :-)

    The plan is actually to allow higher layers in the kernel to read and
    write the buffer cache directly and avoid having to dive into the VFS
    (at least for data already cached).  Part of that plan has already been
    implemented... the buffer cache is now indexed based on byte offset
    ranges (offset,size) instead of block numbers (lblkno,size).

    This means that we can theoretically have vn_read() access the buffer
    cache *directly*, based on the file offset, without having to have any
    knowledge of the file block size or the inner workings of the VFS.
    I fully intend to do that eventually.

    To complete the equation we need a means to tell the VFS to instantiate
    a buffer that does not exist, since the VFS is the only entity that
    knows what the file block size for that buffer is going to be:

	bp = VOP_GETBUFFER(vp, offset)

    Higher layers such as vn_read() and vn_write() would make this call
    when they cannot find an existing buffer which serves their needs,
    and would then read or write the buffer via bread() or bwrite() (etc)
    as needed to make it valid.

    This would also allow us to get rid of VOP_READ, VOP_WRITE, VOP_GETPAGES,
    and VOP_PUTPAGES.  Instead we would just iterate a loop with
    VOP_GETBUFFER (if the buffer does not already exist) and then perform
    the appropriate operations on the returned buffer and be done with it.

    And, as a bonus, the buffer cache now *requires* that filesystem buffers
    be VMIO'd, meaning that they are backed by the VM page cache.  This
    means that we can rely on the buffer's b_pages[] array of VM pages and
    in many cases not even have to map the buffer into kernel virtual memory
    (which is the single biggest time cost for current buffer cache
    operations).

    I don't plan on doing any more work on this until next year, because a lot
    of it is going to have to be tightly integrated into the clustering work.
    Only moderate work on FFS itself will be needed.  Since FFS operates
    on buffers internally anyway, it is more a matter of removing all the
    UIO support in favor of the direct buffer cache API, and abstracting
    file extensions/ftruncate out into the kernel proper.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Users mailing list