MPSAFE progress and testing in master
Matthew Dillon
dillon at apollo.backplane.com
Mon Jan 4 16:25:42 PST 2010
:Am 28.12.2009 08:43, schrieb Matthew Dillon:
:> The latest push adds fine-grained locking to the namecache and path
:> lookups and adds a sysctl vfs.cache_mpsafe to turn off the MP lock
:Wasn't fine-grained locking one of the issues that lead to the
:FreeBSD5/DragonFly split?
:
:I'm just curious about how your effort differs from what the FreeBSD
:team did for FreeBSD5.
:
:Thanks,
:Patrick
There are a couple of major differences. The biggest ones:
* We don't pass held mutexes down the call stack. Doing so seriously
pollutes the separation between APIs.
* Spinlocks in DragonFly are only allowed to be held at the leafs
of the call chain. While there are a few exceptions, spinlocks
are basically only held around very small bits of code.
* We use our lwkt_token abstraction for subsystem locks. lwkt_token's
are locks which are automatically released when a thread blocks and
automatically reacquired when a thread unblocks, making them immune
to things like lock order reversals. And there's no API pollution
either.
This also allows tokens to be held for potentially very long periods
of time by high level procedures regardless of the complexity of the
call chain. We primarily use tokens when traversing long system lists.
Tokens effectively devolves the code implementation using them back
to an equivalent non-SMP model, where the code in question only has
to worry about things getting ripped out from under it if it blocks.
That is, tokens are kinda like mini-MP locks but on a finer-grain.
* We don't preemptively migrate threads between cpus when they are
running in kernelland. Basically pinning under FreeBSD except it is
automatic when the thread is in kernel mode.
This allows the free use of the per-cpu data without having to
implementing any additional locking to access those data structures.
* We use dedicated per-cpu threads as well as a per-cpu data separation
model. This is most apparent in the network stack and routing table.
FreeBSD tends to use a more fine-grained locked data model.
For example, the packets related to a particular TCP connection get
routed to a particular cpu, so no locking of the INPCBs is needed.
This is one area where we struck it lucky since newer network
interfaces are using toeplitz hashes more and more to implement
separate RR rings. When using such hashes you basically want to
localize the data streams to make best use of per-cpu caches.
Basically when taken all together the combination of the per-cpu data
model and tokens we use gets rid of at least half of what would otherwise
have to be fine-grained-locked in FreeBSD.
One interesting thing to note is that the namecache locking I just
did in DFly is actually fine-grained, while FreeBSD is using a coarse
lock. This isn't necessarily higher-performing or anything like that,
filename lookups are one of those things that don't really impact
overall performance under real-world loads that much, it just turned
out to be easy to do in DragonFly due to the way our namecache topology
works.
Only the VM system and disk drivers are left to make MPSAFE. The
filesystems aren't MPSAFE yet but HAMMER is the only thing we really
care about there (from a performance perspective) and HAMMER is
MPSAFE for cached read and stat operations already. The AHCI driver
is MPSAFE as well but the device strategy dispatch code isn't yet so
it is still called with the MP lock held.
If Aggelos ever gets a good chunk of time he'll commit the NETMP
work (the remaining lockup work for the network). Right now the
protocol threads are MPSAFE but the userland sockbuf model is not.
So it is basically down to the VM system now in terms of what will
produce the biggest improvements in performance. My intention is
to do a medium-grain locking implementation for the VM system.
hard lockmgr style locks for the VM maps, soft token locks for the
VM objects, and probably soft token locks for PMAP manipulation.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Users
mailing list