panic: assertion: dd->uschedcp != lp in bsd4_resetpriority
YONETANI Tomokazu
qhwt+dfly at les.ath.cx
Wed Jun 18 02:27:31 PDT 2008
On Thu, May 29, 2008 at 10:27:08AM -0700, Matthew Dillon wrote:
. ..
> The code you found the bug appears to be bsd4_resetpriority() when
> called on a 'random' lwp instead of the 'current' lwp on the calling
> cpu. That is, when the softclock does the allproc_scan(). That
> scan only occurs on one cpu, with the MP lock held (I think), but can
> race against manipulation of the per-cpu scheduler data (dd->*) on other
> cpus. Such manipulate can occur on other cpus without the MP lock held
> and without the spinlock held.
>
> Because the LWP in the failing path (the one doing the allproc_scan) is
> potentially owned by some other cpu, that KKASSERT() can race another
> cpu and cause a panic. The bug is actually the presence of the KKASSERT
> and not the manipulation of the spin lock.
>
> Moving the spinlock thus might not completely solve the problem.
>
> I think all that needs to be done here is to remove the KKASSERT(). The
> worst that happens is that a cpu gets an extra need_resched IPI (which
> does no harm), or dd->upri winds up with the wrong value, which should
> also do no real harm. I believe the KKASSERT itself is wrong. Instead,
> augment the code comment above that section to indicate that the
> dd_uschedcp variable can be ripped out from under the code and
> cause a spurious need_user_resched() on the target cpu and caused
> dd->upri to be wrong for a short period of time.
. .. assuming that I'm expected to augment the comment, does this look OK?
* Remove KKASSERT() from the code block where not all callers' CPU own
the LWP, occasionally leading to a panic because of race between CPUs.
--- usched_bsd4.c.orig 2008-05-15 16:15:08.000000000 +0900
+++ usched_bsd4.c 2008-06-18 17:54:07.000000000 +0900
@@ -786,10 +786,16 @@
* occurs if the LWP is already on a scheduler queue, which means
* that idle cpu notification has already occured. At most we
* need only issue a need_user_resched() on the appropriate cpu.
+ *
+ * The LWP may be owned by a CPU different from the current one,
+ * in which case dd->uschedcp may be modified without an MP lock
+ * or a spinlock held. The worst that happens is that the code
+ * below causes a spurious need_user_resched() on the target CPU
+ * and dd->pri to be wrong for a short period of time, both of
+ * which are harmless.
*/
if (reschedcpu >= 0) {
dd = &bsd4_pcpu[reschedcpu];
- KKASSERT(dd->uschedcp != lp);
if ((dd->upri & ~PRIMASK) > (lp->lwp_priority & ~PRIMASK)) {
dd->upri = lp->lwp_priority;
#ifdef SMP
Thanks.
More information about the Bugs
mailing list