I/O Scheduler Framework

Alex Hornung ahornung at gmail.com
Sat Feb 28 02:48:52 PST 2009


This is my first mail to the mailing list, but I've been active in the
IRC for the last 4 weeks or so. My IRC nick is alexh (was zceee19). As
some of you may know, I've been working on the I/O (Disk) Scheduler
Framework for the past few weeks. At the moment I've reached a point
where it is usable and seems to work stable. I'm currently in process of
documenting it properly.

In its current implementation, there can only be one policy per disk
(e.g. fair share for ad0, elevator for ad1, ...). While this resembles
similar scheduler frameworks in other systems (e.g. NetBSD) it is
non-optimal. While it is possible to change policies on the fly with a
userland utility, it is not possible to have various stacked policies.

An example of usage would be one Matt suggested the other one in the IRC
channel: having for example a bandwidth limiting scheduler for some
processes on top of a fair share or similar policy.

While a solution for this particular problem would be to have a fixed
maximum of two schedulers per disk and divide the policies into two
categories (first order, second order), it may not be enough for all
(useful) usage examples. If it was of real interest to implement at
least some level of stackable policies, I would suggest sticking to the
- Dividing policies into first order (fair share, anticipatory, ...) and
second order (bandwidth limiter, ...)
- Each disk has exactly one first order and either zero or one second
order policies.
- New BIOs are fed in at the top scheduler (so if there is a second
order policy they are fed into the second order policy) and then those
not queued by the top scheduler fall down to the first order scheduler.
An example here would be the second order policy (a bandwidth limiter)
only queueing BIOs from processes that should be limited and letting all
other BIOs fall down into the first order policy.
- The first order scheduler does its job as usual, but apart from having
only his own queued BIOs, it receives (or maybe polls?) for BIOs from
the second order policy.
- If it is of interest to track the service time, the second order
policy does get feedback on when a BIO terminates, just as a first order
policy does now.

As previously stated this might be a limited approach, but nonetheless
it adds features without much overhead or complication.
What I would like to hear is suggestions on what to do; do we want
stackable (hierarchical) policies? if so, is the proposed approach good
enough? if not, what other possibilities are there? what use cases are
not covered by the suggested approach but are solved in other

Sorry for the long mail; I'm looking forward to your



More information about the Kernel mailing list