MPSAFE progress and testing in master

Mon Jan 4 16:25:42 PST 2010

:Am 28.12.2009 08:43, schrieb Matthew Dillon:
:>     The latest push adds fine-grained locking to the namecache and path
:>     lookups and adds a sysctl vfs.cache_mpsafe to turn off the MP lock
:Wasn't fine-grained locking one of the issues that lead to the
:FreeBSD5/DragonFly split?
:
:I'm just curious about how your effort differs from what the FreeBSD
:team did for FreeBSD5.
:
:Thanks,
:Patrick

    There are a couple of major differences.  The biggest ones:

    * We don't pass held mutexes down the call stack.  Doing so seriously
      pollutes the separation between APIs.

    * Spinlocks in DragonFly are only allowed to be held at the leafs
      of the call chain.  While there are a few exceptions, spinlocks
      are basically only held around very small bits of code.

    * We use our lwkt_token abstraction for subsystem locks.  lwkt_token's
      are locks which are automatically released when a thread blocks and
      automatically reacquired when a thread unblocks, making them immune
      to things like lock order reversals.  And there's no API pollution
      either.

      This also allows tokens to be held for potentially very long periods
      of time by high level procedures regardless of the complexity of the
      call chain.  We primarily use tokens when traversing long system lists.

      Tokens effectively devolves the code implementation using them back
      to an equivalent non-SMP model, where the code in question only has
      to worry about things getting ripped out from under it if it blocks.
      That is, tokens are kinda like mini-MP locks but on a finer-grain.

    * We don't preemptively migrate threads between cpus when they are
      running in kernelland.  Basically pinning under FreeBSD except it is
      automatic when the thread is in kernel mode.

      This allows the free use of the per-cpu data without having to
      implementing any additional locking to access those data structures.

    * We use dedicated per-cpu threads as well as a per-cpu data separation
      model.  This is most apparent in the network stack and routing table.
      FreeBSD tends to use a more fine-grained locked data model.

      For example, the packets related to a particular TCP connection get
      routed to a particular cpu, so no locking of the INPCBs is needed.

      This is one area where we struck it lucky since newer network
      interfaces are using toeplitz hashes more and more to implement
      separate RR rings.  When using such hashes you basically want to
      localize the data streams to make best use of per-cpu caches.

    Basically when taken all together the combination of the per-cpu data
    model and tokens we use gets rid of at least half of what would otherwise
    have to be fine-grained-locked in FreeBSD.

    One interesting thing to note is that the namecache locking I just
    did in DFly is actually fine-grained, while FreeBSD is using a coarse
    lock.  This isn't necessarily higher-performing or anything like that,
    filename lookups are one of those things that don't really impact
    overall performance under real-world loads that much, it just turned
    out to be easy to do in DragonFly due to the way our namecache topology
    works.

    Only the VM system and disk drivers are left to make MPSAFE.  The
    filesystems aren't MPSAFE yet but HAMMER is the only thing we really
    care about there (from a performance perspective) and HAMMER is
    MPSAFE for cached read and stat operations already.  The AHCI driver
    is MPSAFE as well but the device strategy dispatch code isn't yet so
    it is still called with the MP lock held.

    If Aggelos ever gets a good chunk of time he'll commit the NETMP
    work (the remaining lockup work for the network).  Right now the
    protocol threads are MPSAFE but the userland sockbuf model is not.

    So it is basically down to the VM system now in terms of what will
    produce the biggest improvements in performance.  My intention is
    to do a medium-grain locking implementation for the VM system.
    hard lockmgr style locks for the VM maps, soft token locks for the
    VM objects, and probably soft token locks for PMAP manipulation.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>