SoC Project Proposal- Anticipatory Disk I/O scheduler

Nirmal Thacker thacker.nirmal at gmail.com
Sat Apr 5 11:08:34 PDT 2008


Hello,


On Fri, Apr 4, 2008 at 8:58 PM, Matthew Dillon
<dillon at apollo.backplane.com> wrote:
>
>  :Hello,
>
>
> :
>  :I had submitted a proposal for the Google SoC last week on the
>  :DragonFlyBSD Anticipatory Disk I/O scheduler. Since the deadline has
>  :been extended I could do with some improvements, if it requires any.
>  :If some of you could review the proposal and let me know if it falls
>  :short or is fine it'd be great.
>  :
>  :Since I was running out of time I could not give a lot of design
>  :issues from the actual code structure. If this is required could you
>  :point me to a segment code tree where I must do further research.
>  :
>  :The proposal is here: www.cc.gatech.edu/~thacker/DFlyBSD_proposal_thacker.pdf
>  :
>  :Thanks!
>  :
>  :Nirmal Thacker
>
>     Very interesting.  It's true that the user process winds up being
>     somewhat synchronous when doing reads from the disk.  Individual
>     filesystems do attempt to do some read-ahead but have never been
>     able to do all that good a job of it.  The issue is complicated by
>     the fact that only the filesystem code really knows what blocksize to
>     use for buffer cache operations.
>
>     Having a thread heuristically prefetch data has interesting
>     implications.  It *IS* possible to do even without knowing the block
>     size the filesystem normally chooses.  It can be done because all
>     filesystem related I/O via the buffer cached is backed by a VM object,
>     thus making it possible to construct I/O's that directly map the
>     backing pages without actually having to go through the buffer cache.
>
>     The big giant lock we still have in DragonFly interferes with MP
>     issues but it would probably be beneficial to run such a thread
>     on several cpus and dispatch the read-ahead signal to a 'different'
>     cpu then the one that triggered the operation, and perhaps work on
>     getting rid of the need for the big giant lock in the low level I/O
>     system at the same time.
This is interesting - I was not aware of the MP issues for the I/O
subsystem in DragonFlyBSD. The only documentation I had was the
DragonFlyBSD paper.

So the idea is to remove the FS from the critical path of
communication , and use read-ahed heuristics to fill up the buffer,
taking care of the MP issues. Nice- I did a similar project some time
ago when we eliminated the read ahead completely (in the VFS of Linux)
 and allowed the programmer to cache those blocks which he thought
important. Of course in this case the programmer had full knowledge of
the disk layout, block size etc. It allowed to create user level
filesystems which interacted asynchronously with the underlying disk.

>
>     I think it would be worthwhile.  Then instead of the filesystem
>     explicitly doing the read-ahead (which is somewhat expensive and right
>     smack in the middle of the critical path), it could instead pass
>     heuristical hints to a read-ahead subsystem and let the subsystem
>     deal with the read-aheads.

Will add this into the proposal and probably a lilttle more on the lock issue.

>
>                                         -Matt
>                                         Matthew Dillon
>                                         <dillon at backplane.com>
>
>

I'd like more reviews/suggestions as the submission date is up on the 7th

Thanks
Nirmal





More information about the Kernel mailing list