non-zero exclusive count(Re: locking against myself in getcacheblk()?)

Sun Dec 26 20:58:48 PST 2010

On Sun, Dec 26, 2010 at 11:09:17AM -0800, Matthew Dillon wrote:
>     non-zero exclusive counts usually mean an extra lock release or a
>     missing lock acquisition for a lockmgr lock.  It can be a little
>     trickly if it is a vnode lock since a completely unrelated bit of
>     code might be causing the situation and then later on a perfectly
>     fine piece of code triggers it.

As for non-zero exclusive count panic, it's odd because although
lockmgr() triggered the non-zero exclusive count panic, kgdb shows
that lkp->lk_exclusivecount == 0 (shown below).  Usually the panic is
followed by several more panics even during the dump routine is running,
but luckily I managed to save the crash dump and put it on my leaf account
as ~y0netan1/crash/{kern,vmcore}.21 (these are saved yesterday, so this
kernel doesn't have the patch you suggested in another message yet).

				:
#14 0xffffffff802a6cc4 in panic (
    fmt=0xffffffff80508548 "lockmgr: non-zero exclusive count")
    at /usr/obj/src.test/sys/kern/kern_shutdown.c:783
#15 0xffffffff80298497 in lockmgr (lkp=0xffffffe060e72878,
    flags=<value optimized out>) at /usr/obj/src.test/sys/kern/kern_lock.c:369
#16 0xffffffff8042e387 in vm_map_find (map=0xffffffe060e72800, object=0x0,
    offset=0, addr=0xffffffe060410ac0, length=32768, align=4096, fitit=1,
    maptype=1 '\001', prot=3 '\003', max=7 '\a', cow=0)
    at /usr/obj/src.test/sys/vm/vm_map.c:1201
#17 0xffffffff8043175a in vm_mmap (map=0xffffffe060e72800,
    addr=0xffffffe060410ac0, size=<value optimized out>, prot=3 '\003',
    maxprot=7 '\a', flags=<value optimized out>, handle=0x0, foff=0)
    at /usr/obj/src.test/sys/vm/vm_mmap.c:1362
#18 0xffffffff80431d61 in kern_mmap (vms=0xffffffe060e72800,
    uaddr=<value optimized out>, ulen=<value optimized out>,
    uprot=<value optimized out>, uflags=<value optimized out>, fd=-1, upos=0,
    res=0xffffffe060410b58) at /usr/obj/src.test/sys/vm/vm_mmap.c:397
				:
(kgdb) fr 15
#15 0xffffffff80298497 in lockmgr (lkp=0xffffffe060e72878,
    flags=<value optimized out>) at /usr/obj/src.test/sys/kern/kern_lock.c:369
369                             panic("lockmgr: non-zero exclusive count");
(kgdb) p *lkp
$1 = {lk_spinlock = {lock = 0}, lk_flags = 3146240, lk_sharecount = 1,
  lk_waitcount = 3, lk_exclusivecount = 0, lk_unused1 = 0,
  lk_wmesg = 0xffffffff80537872 "thrd_sleep", lk_timo = 0,
  lk_lockholder = 0xffffffffffffffff}

The known good kernel is built from the source as of 0b38f, and known
bad kernel is from 2a418.  Sometimes it was even possible to trigger
the panic by just opening the older kernel and the dump with kgdb, or
running buildkernel, but it's not consistent among ``bad'' kernels.