panic: assertion: p->p_lock == 0 in kern_wait

YONETANI Tomokazu y0n3t4n1 at gmail.com
Thu Apr 21 07:27:06 PDT 2011


On Thu, Feb 24, 2011 at 09:29:16AM +0900, YONETANI Tomokazu wrote:
> On Mon, Feb 21, 2011 at 01:41:26PM +0900, YONETANI Tomokazu wrote:
> > Apparently a few hours of pbulk test can trigger this panic (this is
> > on x86_64 and the kernel is built from source as of 5347900e6).
> > As opposed to what KKASSERT claims, p->p_lock doesn't hold the non-zero
> > value in the frame 5:
> > 
> 			:
> >   (kgdb) fr 5
> >   #5  0xffffffff802989a1 in kern_wait (pid=<value optimized out>,
> >       status=0xffffffe05e997a74, options=1528637672, rusage=0x0,
> >       res=0xffffffe05e997b58) at /usr/src/sys/kern/kern_exit.c:901
> >   901                             KKASSERT(p->p_lock == 0);
> >   (kgdb) p p->p_lock
> >   $1 = 0
> 
> After poking here and the, I think this KKASSERT() can simply go away
> as proc_remove_zombie() will wait for p->p_lock to drop to zero anyway.
> 
> 			/*
> 			 * Unlink the proc from its process group so that
> 			 * the following operations won't lead to an
> 			 * inconsistent state for processes running down
> 			 * the zombie list.
> 			 */
> 			KKASSERT(p->p_lock == 0);
> 			proc_remove_zombie(p);
> 			leavepgrp(p);

The following is what I have in my tree.  What it does is to hold proc_token
while waiting for p->p_lock to drop, just as done in proc_remove_zombie().
If I don't hold the proc_token as in the first chunk, I see the

  proc_remove_zombie: waiting for %p->p_lock to drop

message a few times every hour on the console.  I guess it may also
imply that the race is between a code which holds proc_token for
a long time but not p->p_token, like all*_scan().


diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
index 1e5a110..30fc0af 100644
--- a/sys/kern/kern_exit.c
+++ b/sys/kern/kern_exit.c
@@ -867,8 +867,10 @@ loop:
 			 * put a hold on the process for short periods of
 			 * time.
 			 */
+			lwkt_gettoken(&proc_token);
 			while (p->p_lock)
 				tsleep(p, 0, "reap3", hz);
+			lwkt_reltoken(&proc_token);
 
 			/* Take care of our return values. */
 			*res = p->p_pid;
@@ -898,7 +900,6 @@ loop:
 			 * inconsistent state for processes running down
 			 * the zombie list.
 			 */
-			KKASSERT(p->p_lock == 0);
 			proc_remove_zombie(p);
 			leavepgrp(p);
 
diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c
index cb067f0..048a619 100644
--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -661,6 +661,7 @@ proc_remove_zombie(struct proc *p)
 {
 	lwkt_gettoken(&proc_token);
 	while (p->p_lock) {
+		kprintf("%s: waiting for %p->p_lock to drop\n", __func__, p);
 		tsleep(p, 0, "reap1", hz / 10);
 	}
 	LIST_REMOVE(p, p_list); /* off zombproc */
-- 
1.7.3.2






More information about the Bugs mailing list