About GSoC

Venkatesh Srinivas me at endeavour.zapto.org
Sat Mar 19 03:18:45 PDT 2011


On March 19th Alex Hornung wrote:
> Both dsched and dm provide some nice abstractions that will
> allow you to focus more on your project itself than messing with a lot
> of our kernel internals.

DSched's  interface for writing disk schedulers is very
straightforward; you provide 4 methods for any disk scheduler: *[1]

1) prepare(): Prepare is invoked any time a disk scheduler is set for
a disk. Depending on how complex your scheduler is, it can range from
doing nothing (see noop_prepare() in kern/kern_dsched.c) to doing
anything complex you want (see fq_prepare in
kern/dsched/fq/fq_diskops.c) such as starting kernel threads and
allocating stuff and whatnot.

2) teardown(): called when a disk scheduler is being shut down or
switched away from, this reverses the work of prepare(). It can
deallocate things, kill threads, whatnot.

3) cancel_all(): cancel_all() should remove all queued I/O requests to
a given disk instead of issuing them.

4) queue(): This is the most interesting routine!

Queue accepts a single I/O request from a thread to a certain disk and
does whatever work is needed to make that request either happen
immediately or defer it to some point in the future.

int queue(struct dsched_disk_ctx *disk, struct dsched_thread_io *tdio,
struct bio *bio) {}

'disk' is a pointer to a per-disk structure; 'dsched_disk_ctx' just
contains a reference to the disk it is controlling along with a few
other fields (a refcount, some list linkages, ...). 'tdio' is a
reference to the thread performing the disk I/O request. It provides a
place to queue BIOs and a few other references. And 'bio' is the I/O
request -- it holds a lot of information about the request -- whether
it is a read, write, or any one of the other commands in sys/buf.h
(see BUF_CMD_*), what the request data is, etc.

A queue() routine can completely pass the buck -- if it returns a
nonzero value, the underlying device is handed the BIO directly. :)

DSched sits in the middle of a fairly deep stack -- below it sits a
routine called bioqdisksort(), which also shuffles I/O requests a bit;
bioqdisksort() lives in kern/subr_disk.c, if you'd like to take a
look. In theory stacking a disk scheduler on top of a code sorting
reads-ahead-of-writes-(mostly) is a bad idea. In practice, I've found
performance changes (a mix of good and bad) when disabling it and
turning on Fair Queuing... no idea why though yet.

Anyway writing a dsched module really is straightforward! The
difficult part is actually all inside queue() -- deciding when to
queue an I/O request versus issue it directly and how to queue the
requests. We currently have two disk schedulers in kernel -- No-Op
does no queueing and can be found inside kern/kern_dsched.c (look for
the functions and structures starting with noop_), and Fair Queuing,
which implements something of a fair queuing/fair share approach. In
FQ, threads will directly issue I/O till they hit a fair share limit;
when they do, they queue the I/O requests for a worker thread
(fq_dispatcher) to issue at a later time.

There are some solid references to what other OSes are doing here or
what research groups have tinkered with, if that is interesting:

(BSD specific):
* http://www.happyemi.org/hybrid/guide.html and
http://web.archive.org/web/20060821124302/wikitest.freebsd.org/Hybrid
** These two pages describe an old attempt at adding a disk scheduler
to FreeBSD 4.x+; their entry points look rather similar to DSched,
except their queue()-analogue doesn't issue I/O; disks instead call
get_first() to get requests. They implemented a simple scheduler
called 'Hybrid', which is potentially interesting.

(Linux)
* http://retis.sssup.it/~fabio/linux/bfq/description.php and
http://algo.ing.unimo.it/people/paolo/disk_sched/
** This is a pretty interesting and fairly new disk scheduler for
Linux; it is another fair queuing scheduler, but it uses a weighted FQ
variant and uses sector counts rather than byte counts for its budget.
It has some snazzy features, like low-latency guarantees and minimum
bandwidth guarantees.

There is probably work in understanding how well our current FQ
scheduler works, how it can be improved, how the dsched interfaces
themselves could be improved and could play nicer with the surrounding
system. There is also plenty of scope for writing new, better disk
schedulers.

Hope this made some sense!

> The good thing about projects in these areas is that you can actually
> do the development on a vkernel, if you so like. It's very convenient
> to do so as you can simply gdb into the kernel instead of getting a
> core dump, and the reboot time is also cut :)

I'm really going to second this -- in under 5 minutes using vkernels
on leaf.dragonflybsd.org, I can have a fresh copy of the dragonfly
sources, apply a patch, build and run a kernel, and boot into it,
without ever touching a second machine.

If you want to work in some VM (qemu or VBox or w/e), that would be
fine; running on real hardware would work just fine also.

Good luck!
-- vs

*[1]: There are actually a few more entry points, but no scheduler
currently takes advantage of them.





More information about the Kernel mailing list