git: kernel - Major MPSAFE Infrastructure
Matthew Dillon
dillon at crater.dragonflybsd.org
Fri Aug 27 17:24:03 PDT 2010
commit 77912481ac5f5d886b07c9f7038b03eba09b2bca
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Thu Aug 26 21:18:06 2010 -0700
kernel - Major MPSAFE Infrastructure
* vm_page_lookup() now requires the vm_token to be held on call instead of
the MP lock. And fix the few places where the routine was being called
without the vm_token.
Various situations where a vm_page_lookup() is performed followed by
vm_page_wire(), without busying the page, and other similar situations,
require the vm_token to be held across the whole block of code.
* bio_done callbacks are now MPSAFE but some drivers (ata, ccd, vinum,
aio, nfs) are not MPSAFE yet so get the mplock for those. They will
be converted to a generic driver-wide token later.
* Remove critical sections that used to protect VM system related
interrupts, replace with the vm_token.
* Spinlocks now bump thread->td_critcount in addition to
mycpu->gd_spinlock*. Note the ordering is important. Then remove
gd_spinlock* checks elsewhere that are covered by td_critcount and
replace with assertions.
Also use td_critcount in the kern_mutex.c code instead of gd_spinlock*.
This fixes situations where the last crit_exit() would call splx()
without checking for spinlocks. Adding the additional checks would
have made the crit_*() inlines too complex so instead we just fold
it into td_critcount.
* lwkt_yield() no longer guarantees that lwkt_switch() will be called
so call lwkt_switch() instead in places where a switch is required.
For example, to unwind a preemption. Otherwise the kernel could end
up live-locking trying to yield because the new switch code does not
necessarily schedule a different kernel thread.
* Add the sysctl user_pri_sched (default 0). Setting this will make
the LWKT scheduler more aggressively schedule user threads when
runnable kernel threads are unable to gain token/mplock resources.
For debugging only.
* Change the bufspin spinlock to bufqspin and bufcspin, and generally
rework vfs_bio.c to lock numerous fields with bufcspin. Also use
bufcspin to interlock waitrunningbufspace() and friends.
Remove several mplocks in vfs_bio.c that are no longer needed.
Protect the page manipulation code in vfs_bio.c with vm_token instead
of the mplock.
* Fix a deadlock with the FINDBLK_TEST/BUF_LOCK sequence which can occur
due to the fact that the buffer may change its (vp,loffset) during
the BUF_LOCK call. Even though the code checks for this after
the lock succeeds there is still the problem of the locking operation
itself potentially creating a deadlock betwen two threads by locking
an unexpected buffer when the caller is already holding other buffers
locked.
We do this by adding an interlock refcounter, b_refs. getnewbuf()
will avoid reusing such buffers.
* The syncer_token was not protecting all accesses to the syncer list.
Fix that.
* Make HAMMER MPSAFE. All major entry points now use a per-mount token,
hmp->fs_token. Backend callbacks (bioops, bio_done) use hmp->io_token.
The cache-case for the read and getattr paths require not tokens at
all (as before).
The bitfield flags had to be separated into two groups to deal with
SMP cache coherency races.
Certain flags in the hammer_record structure had to be separated for
the same reason.
Certain interactions between the frontend and the backend must use
the hmp->io_token.
It is important to note that for any given buffer there are two
locking entities: (1) The hammer structure and (2) The buffer cache
buffer. These interactions are very fragile.
Do not allow the kernel to flush a dirty buffer if we are unable
to obtain a norefs-interlock on the buffer, which fixes numerous
frontend/backend MP races on the io structure.
Add a write interlock in one of the recover_flush_buffer cases.
Summary of changes:
sys/dev/agp/agp.c | 4 +
sys/dev/agp/agp_i810.c | 5 +-
sys/dev/disk/ata/ata-raid.c | 4 +
sys/dev/disk/ccd/ccd.c | 8 +-
sys/dev/raid/vinum/vinumhdr.h | 1 +
sys/dev/raid/vinum/vinuminterrupt.c | 6 +
sys/kern/kern_exec.c | 9 +-
sys/kern/kern_mutex.c | 41 ++-
sys/kern/kern_slaballoc.c | 12 +-
sys/kern/kern_spinlock.c | 2 +
sys/kern/lwkt_thread.c | 23 +-
sys/kern/uipc_syscalls.c | 4 +
sys/kern/usched_bsd4.c | 28 +-
sys/kern/vfs_aio.c | 2 +
sys/kern/vfs_bio.c | 498 +++++++++++++++++++++--------------
sys/kern/vfs_cluster.c | 6 +-
sys/kern/vfs_subr.c | 16 +-
sys/kern/vfs_sync.c | 62 +++--
sys/platform/pc32/isa/clock.c | 10 +-
sys/platform/pc64/isa/clock.c | 10 +-
sys/sys/bio.h | 2 +-
sys/sys/buf.h | 6 +-
sys/sys/spinlock2.h | 47 +---
sys/sys/vnode.h | 3 +-
sys/vfs/devfs/devfs_vnops.c | 11 +-
sys/vfs/hammer/hammer.h | 33 ++-
sys/vfs/hammer/hammer_flusher.c | 3 +
sys/vfs/hammer/hammer_io.c | 177 ++++++++++---
sys/vfs/hammer/hammer_object.c | 6 +-
sys/vfs/hammer/hammer_ondisk.c | 9 +-
sys/vfs/hammer/hammer_recover.c | 9 +
sys/vfs/hammer/hammer_volume.c | 6 +
sys/vfs/nfs/nfs_bio.c | 10 +
sys/vm/swap_pager.c | 4 +-
sys/vm/vm_page.c | 8 +-
35 files changed, 694 insertions(+), 391 deletions(-)
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/77912481ac5f5d886b07c9f7038b03eba09b2bca
--
DragonFly BSD source repository
More information about the Commits
mailing list