Dragonfly under KVM

Matthew Dillon dillon at apollo.backplane.com
Sat Jun 14 09:03:15 PDT 2008


:..
:> :#0  lwkt_acquire (td=0xc6a59e70) at /usr/src/sys/kern/lwkt_thread.c:1048
:> :#1  0xc02c66af in bsd4_select_curproc (gd=0xff800000) at 
:> :/usr/src/sys/kern/usched_bsd4.c:358
:> :#2  0xc02c6829 in bsd4_release_curproc (lp=0xea634c00) at 
:> :/usr/src/sys/kern/usched_bsd4.c:322
:> :#3  0xc04b8239 in passive_release (td=0xdfe8aba0) at 
:> :..
:Execution appears to be looping indefinitely inside LWKT code.
:
:Debugging gives the output below. Again all four core are running at 100%.
:
:
:Program received signal SIGINT, Interrupt.
:lwkt_process_ipiq_core (sgd=<value optimized out>, ip=0xc67a7000, frame=0x0)
:     at /usr/src/sys/kern/lwkt_ipiq.c:522
:522	    while (wi - (ri = ip->ip_rindex) > 0) {
:(gdb) l
:...
:     at /usr/src/sys/sys/thread2.h:244
:#4  0xc02c71e6 in bsd4_setrunqueue (lp=0xe5f8b400)
:     at /usr/src/sys/kern/usched_bsd4.c:551
:#5  0xc02c72be in bsd4_acquire_curproc (lp=0xe5f8b400)
:     at /usr/src/sys/kern/usched_bsd4.c:271
:#6  0xc04b9603 in syscall2 (frame=0xe5a02d40)
:     at /usr/src/sys/platform/pc32/i386/trap.c:349
:#7  0xc04a3396 in Xint0x80_syscall ()
:...
:(gdb) bt
:#0  lwkt_process_ipiq_core (sgd=<value optimized out>, ip=0xc67b1000,
:     frame=0x0) at /usr/src/sys/kern/lwkt_ipiq.c:558
:#1  0xc02c94ad in lwkt_process_ipiq () at /usr/src/sys/kern/lwkt_ipiq.c:452
:#2  0xc02c9830 in lwkt_send_ipiq3 (target=0xff808000,
:     func=0xc02c8519 <lwkt_schedule>, arg1=0xc0600170, arg2=0)
:     at /usr/src/sys/kern/lwkt_ipiq.c:185
:#3  0xc02c863c in lwkt_schedule (td=0xc0600170)
:     at /usr/src/sys/sys/thread2.h:244
:#4  0xc02c71e6 in bsd4_setrunqueue (lp=0xe5f8b400)
:     at /usr/src/sys/kern/usched_bsd4.c:551
:#5  0xc02c72be in bsd4_acquire_curproc (lp=0xe5f8b400)
:     at /usr/src/sys/kern/usched_bsd4.c:271
:#6  0xc04b9603 in syscall2 (frame=0xe5a02d40)
:     at /usr/src/sys/platform/pc32/i386/trap.c:349
:#7  0xc04a3396 in Xint0x80_syscall ()
:     at /usr/src/sys/platform/pc32/i386/exception.s:876
:#8  0xe5a02d40 in ?? ()

    I think I see what may be happening here, and I am starting to wonder
    if it is also the cause of the system lockups I am getting when testing
    HAMMER under extreme loads (with hundreds of user threads which are
    sometimes cpu-bound).

    I think it may be deadlocking between lwkt_acquire() and lwkt_schedule().
    The thread trying to migrate between cpu's is getting stuck and the 
    acquisition loop is not processing incoming IPIs while it is waiting for
    the thread to deschedule on the other cpu.

    Please try this patch.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>
Index: lwkt_thread.c
===================================================================
RCS file: /cvs/src/sys/kern/lwkt_thread.c,v
retrieving revision 1.115
diff -u -p -r1.115 lwkt_thread.c
--- lwkt_thread.c	2 Jun 2008 16:54:21 -0000	1.115
+++ lwkt_thread.c	14 Jun 2008 15:56:28 -0000
@@ -1045,8 +1045,12 @@     if (gd != mycpu) {
 	cpu_lfence();
 	KKASSERT((td->td_flags & TDF_RUNQ) == 0);
 	crit_enter_gd(mygd);
-	while (td->td_flags & (TDF_RUNNING|TDF_PREEMPT_LOCK))
+	while (td->td_flags & (TDF_RUNNING|TDF_PREEMPT_LOCK)) {
+#ifdef SMP
+	    lwkt_process_ipiq();
+#endif
 	    cpu_lfence();
+	}
 	td->td_gd = mygd;
 	TAILQ_INSERT_TAIL(&mygd->gd_tdallq, td, td_allq);
 	td->td_flags &= ~TDF_MIGRATING;
@@ -1222,8 +1226,12 @@ {
     thread_t td = arg;
     globaldata_t gd = mycpu;
 
-    while (td->td_flags & (TDF_RUNNING|TDF_PREEMPT_LOCK))
+    while (td->td_flags & (TDF_RUNNING|TDF_PREEMPT_LOCK)) {
+#ifdef SMP
+	lwkt_process_ipiq();
+#endif
 	cpu_lfence();
+    }
     td->td_gd = gd;
     cpu_sfence();
     td->td_flags &= ~TDF_MIGRATING;





More information about the Bugs mailing list