panic: assertion: dd->uschedcp != lp in bsd4_resetpriority

YONETANI Tomokazu qhwt+dfly at les.ath.cx
Wed Jun 18 02:27:31 PDT 2008


On Thu, May 29, 2008 at 10:27:08AM -0700, Matthew Dillon wrote:
. ..
>     The code you found the bug appears to be bsd4_resetpriority() when
>     called on a 'random' lwp instead of the 'current' lwp on the calling
>     cpu.   That is, when the softclock does the allproc_scan().  That
>     scan only occurs on one cpu, with the MP lock held (I think), but can
>     race against manipulation of the per-cpu scheduler data (dd->*) on other
>     cpus.  Such manipulate can occur on other cpus without the MP lock held
>     and without the spinlock held.
> 
>     Because the LWP in the failing path (the one doing the allproc_scan) is
>     potentially owned by some other cpu, that KKASSERT() can race another
>     cpu and cause a panic.  The bug is actually the presence of the KKASSERT
>     and not the manipulation of the spin lock.
> 
>     Moving the spinlock thus might not completely solve the problem.
> 
>     I think all that needs to be done here is to remove the KKASSERT().  The
>     worst that happens is that a cpu gets an extra need_resched IPI (which
>     does no harm), or dd->upri winds up with the wrong value, which should
>     also do no real harm.  I believe the KKASSERT itself is wrong.  Instead,
>     augment the code comment above that section to indicate that the
>     dd_uschedcp variable can be ripped out from under the code and
>     cause a spurious need_user_resched() on the target cpu and caused
>     dd->upri to be wrong for a short period of time.

. .. assuming that I'm expected to augment the comment, does this look OK?

* Remove KKASSERT() from the code block where not all callers' CPU own
  the LWP, occasionally leading to a panic because of race between CPUs.

--- usched_bsd4.c.orig	2008-05-15 16:15:08.000000000 +0900
+++ usched_bsd4.c	2008-06-18 17:54:07.000000000 +0900
@@ -786,10 +786,16 @@
 	 * occurs if the LWP is already on a scheduler queue, which means
 	 * that idle cpu notification has already occured.  At most we
 	 * need only issue a need_user_resched() on the appropriate cpu.
+	 *
+	 * The LWP may be owned by a CPU different from the current one,
+	 * in which case dd->uschedcp may be modified without an MP lock
+	 * or a spinlock held.  The worst that happens is that the code
+	 * below causes a spurious need_user_resched() on the target CPU
+	 * and dd->pri to be wrong for a short period of time, both of
+	 * which are harmless.
 	 */
 	if (reschedcpu >= 0) {
 		dd = &bsd4_pcpu[reschedcpu];
-		KKASSERT(dd->uschedcp != lp);
 		if ((dd->upri & ~PRIMASK) > (lp->lwp_priority & ~PRIMASK)) {
 			dd->upri = lp->lwp_priority;
 #ifdef SMP

Thanks.





More information about the Bugs mailing list