Dragonfly under KVM

Gary Allan dragonfly at gallan.co.uk
Fri Jun 6 15:34:40 PDT 2008


Gary Allan wrote:
Hello,

I've been experiencing lock-ups using DragonFly HEAD SMP under kvm. 
Running "make -j8 buildworld" triggers a completely unresponsive state 
and 100.00% CPU usage on all four cores (Seen from host OS). 
I've managed to get gdb attached and get some information.

The kernel is getting caught in a while loop in lwkt_acquire. I can 
reliably trigger this with with a "make -j8 buildworld" under a SMP 
kernel (Otherwise identical to GENERIC, no optimisations.) The OS is 
completely unresponsive and all four cpu cores are running at 100%.

I've included the debug information.

Program received signal SIGINT, Interrupt.
lwkt_acquire (td=0xc6a59e70) at /usr/src/sys/kern/lwkt_thread.c:1048
1048		while (td->td_flags & (TDF_RUNNING|TDF_PREEMPT_LOCK))
(gdb) l
1043	    mygd = mycpu;
1044	    if (gd != mycpu) {
1045		cpu_lfence();
1046		KKASSERT((td->td_flags & TDF_RUNQ) == 0);
1047		crit_enter_gd(mygd);
1048		while (td->td_flags & (TDF_RUNNING|TDF_PREEMPT_LOCK))
1049		    cpu_lfence();
1050		td->td_gd = mygd;
1051		TAILQ_INSERT_TAIL(&mygd->gd_tdallq, td, td_allq);
1052		td->td_flags &= ~TDF_MIGRATING;
(gdb) p td->td_flags
$1 = 8390177
(gdb) p td
$2 = (thread_t) 0xc6a59e70
(gdb) bt
#0  lwkt_acquire (td=0xc6a59e70) at /usr/src/sys/kern/lwkt_thread.c:1048
#1  0xc02c66af in bsd4_select_curproc (gd=0xff800000) at 
/usr/src/sys/kern/usched_bsd4.c:358
#2  0xc02c6829 in bsd4_release_curproc (lp=0xea634c00) at 
/usr/src/sys/kern/usched_bsd4.c:322
#3  0xc04b8239 in passive_release (td=0xdfe8aba0) at 
/usr/src/sys/platform/pc32/i386/trap.c:212
#4  0xc02c870b in lwkt_switch () at /usr/src/sys/kern/lwkt_thread.c:491
#5  0xc02c8b3b in lwkt_mp_lock_contested () at 
/usr/src/sys/kern/lwkt_thread.c:1374
#6  0xc04b0751 in get_mplock () at 
/usr/src/sys/platform/pc32/i386/mplock.s:168
#7  0xe9ef6d34 in ?? ()
#8  0xc04b94a4 in syscall2 (frame=0xe9ef6d40) at 
/usr/src/sys/platform/pc32/i386/trap.c:1371
#9  0xc04a3396 in Xint0x80_syscall () at 
/usr/src/sys/platform/pc32/i386/exception.s:876
#10 0xe9ef6d40 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) jump 1050
Continuing at 0xc02c8bbb.

Continuing execution does not appear to cause any problems.
I can provide additional debugging info if required but I'm unsure of 
how to proceed with this myself.

Regards

Gary





More information about the Bugs mailing list