panic: assertion: p->p_lock == 0 in kern_wait

Thu May 19 04:05:21 PDT 2011

On Mon, Apr 25, 2011 at 09:13:04PM +0900, YONETANI Tomokazu wrote:
> On Sun, Apr 24, 2011 at 11:36:27AM +0900, YONETANI Tomokazu wrote:
> > >      With regards to getting rid of the timeout in the tsleep and using a
> > >      proactive wakeup(), we have to avoid calling wakeup() for 1->0
> > >      transitions unless someone is known to be waiting on p_lock.  This
> > >      can be implementing by adding a WAITING flag to the field and using
> > >      atomic_cmpset_int() to handle the (WAITING | 1) -> (0) transition and
> > >      then calling wakeup() if WAITING was set.
> > > 
> > >      I will augment the sys/refcount.h API and add refcount_wait() and
> > >      refcount_release_wakeup() which encapsulate the appropriate atomic
> > >      ops.  I will leave it up to you if you want to then use the new API
> > >      functions for PHOLD/PRELE, which would give the tsleep case a
> > >      proactive wakeup() instead of having to wait for it to timeout.
> > 
> > So what I need to do is to change PHOLD/PRELE to use refcount_acquire/
> > refcount_release_wakeup and replace p->p_lock loop with
> > refcount_release_wakeup?  I'll give it a try.
>  
> I've been running the kernel with patch(es) attached to this message
> and so far it's running fine under load.  It reduced the number of
> non-zero p->p_lock just before calling proc_remove_zombie() even without
> holding proc_token around the first wait loop.

I added a small code to PHOLD/PRELE to leave the last p->p_lock holder in
p->p_pad0 (well, far from perfect but better than nothing) and found that
it's always sysctl_kern_proc() who calls PHOLD() at a bad timing.
I guessed that's probably because it walks through zombproc and PHOLD()'s
on the processes, some of which are just about to be reaped.  So I added
the following code to skip such processes; the relavant part in kern_wait()
waits for processes whose p->p_nthreads > 0, so I thought it should be fine,
no?
I think I need to wait for a few more days before pbulk can spot other
possible bad callers of PHOLD().


Best Regards,
YONETANI Tomokazu.

diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c
index 6d760e2..942ce6b 100644
--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -945,6 +945,11 @@ sysctl_kern_proc(SYSCTL_HANDLER_ARGS)
 
 			if (!PRISON_CHECK(cr1, p->p_ucred))
 				continue;
+
+			/* don't touch processes about to be reaped */
+			if (p->p_nthreads == 0)
+				continue;
+
 			PHOLD(p);
 			error = sysctl_out_proc(p, req, flags);
 			PRELE(p);