SPL vs. Critical Section vs. Mutexing for device synchronisation

Fri Jun 3 10:15:43 PDT 2005

Hi Matt, hi all,
as I said before, we should not start to fall into an adhoc-change mode,
but think carefully about what we want and need first. I want to
describe the advantages and problems of the mechanisms used here first
and what I want to have afterwards.

(A) Defered interrupt processing via critical sections
The primary advantage of critical sections is the simplicity. They
are very easy to use and as long as there is no actual interrupt
they are very cheap too. As a result they work best when used over
short periods of time. Interrupt processing can be implemented either
via a FIFO model via interrupt masks.

The down-side of critical sections is the coarse granularity making
it unsuitable for any thing not taking a short period of term. Similiar
issues to the mutex concept below apply. It also means we have to be
on the same CPU as the interrupt.

(B) Per-device mutex
This is the model choosen by FreeBSD and Linux. Ignoring dead-locks,
this is actually very simple to use too. When ever the device-specific
code is entered, the mutex is acquired and released when it is left.

The down-side of this are two-fold. First of all it does require *two*
bus-locked instruction, which is quite expensive especially under SMP.
This holds true independent of whether the mutex is contested or not.
The second big problem is that it can dramatically increase the interrupt
latency. (Just like long-term critical section). The results has been
measured for the Linux and FreeBSD implementation and are the one reason
for the preemption mess they have.

(C) Defered interrupt processing via SPL masks
This is the mutual exclusion mechanism tradionally used by the BSDs.
It allows certain device classes to be serialised at once, e.g. to
protect the network stack from infering with the network drivers.
Currently in use are splvm (anything but timer), splbio (for block devices)
and splnet (for network drivers). The nice part of this approach is that
it has a similiar performance as critical sections on UP, but is finer
grained.

The down-side is the big complexity for managing the masks. It is also
more course-grained than it often has to be.

---

Conclusion: I'd like to have two basic mechanisms in the tree:
(a) critical sections for *short* sections
(b) *per-device* interrupt deferal

I don't want to keep the current mess of the SPL masks, but I still think
that a device driver should only affect devices sharing the same IRQ. We
should still be careful about what we do of course, but the impact on the
interrupt latency is much smaller. The interface between devices and the
higher subsystems can be done via critical sections for now, but with the
up-coming removal of Giant a slightly different approach is needed.

A common example is the queuing of packets for drivers. IMO the best
thing to use is a non-blocking list for the outer queue. This keeps
the important property of dead-lock avoidance and can be implemented
on all important architectures as long as we have type-stable memory
for the records. The latter is not that difficult to achive.

I'd avoid the use of per-device mutexes if possible, given the problems
mentioned above. They have occured in all big implementations so far
and the workarounds are something you, Matt, have expressed a strong
dislike against already (in-kernel preemption).

Most parts of the systems should be entire lock-free and also avoid
critical sections if possible, since it makes the code *easier* and
most likely also faster.

Joerg