vkernel stuck in umtxsl state when booting from a NFS root mount
    Rumko 
    rumcic at gmail.com
       
    Mon Oct 19 23:41:07 PDT 2009
    
    
  
Rumko wrote:
> Matthew Dillon wrote:
> 
>> 
>> :When booting a vkernel with '-n 1' I'm not sure when was the last time that
>> :it got stuck, while booting with '-n 2' fails in most if not all cases.
>> 
>>     Ok, all the NFS issues with vkernels should now be fixed.  There were
>>     several problems but the ones causing the permanent stalls were
>> 
>>     (1) NFS was running the kernel out of mbufs, and
>> 
>>     (2) The NFS root mount was NFSv2 instead of NFSv3, and there appears
>> to be bugs with large write()s on NFSv2 mounts.
>> 
>>     I also changed the default mount type from UDP to TCP which will improve
>>     performance.
>> 
>>     I have not figured out what the problem with NFSv2 write()'s is yet.
>>     I'm still working on that one.  But the vkernel should now mount w/
>>     NFSv3 and thus work around that issue.
>> 
>> -Matt
> 
> Can't seem to be able to reproduce any more hangs.
> 
> Thank you
Spoke too soon it seems, was able to make the vkernel hang in the umtxsl state
again.
The problem seems to be the combination of -n X (X > 1) and without -l (so
without locking the threads to specific cpus).
When it got stuck, but wasn't in the umtxsl state (it was using the CPU, but
doing nothing useful) the backtrace looked like (part of the gdb session):
Program received signal SIGINFO, Information request.
lwkt_process_ipiq_core (sgd=<value optimized out>, ip=0x5085d000, frame=0x0)
at /usr/src/sys/kern/lwkt_ipiq.c:582
582         return(wi != ip->ip_windex);
(gdb) bt
#0  lwkt_process_ipiq_core (sgd=<value optimized out>, ip=0x5085d000,
frame=0x0) at /usr/src/sys/kern/lwkt_ipiq.c:582
#1  0x080bc31f in lwkt_process_ipiq () at /usr/src/sys/kern/lwkt_ipiq.c:476
#2  0x080bc4a1 in lwkt_send_ipiq3 (target=0x41400000, func=0x80b0a10
<systimer_intr>, arg1=0x8318acc, arg2=0) at /usr/src/sys/kern/lwkt_ipiq.c:189
#3  0x081de3fc in vktimer_intr (dummy=0x0, frame=0x0)
at /usr/src/sys/platform/vkernel/platform/systimer.c:218
#4  0x081d8871 in kqueue_intr (arg=0x0, frame=0x0)
at /usr/src/sys/platform/vkernel/platform/kqueue.c:201
#5  0x080913eb in ithread_handler (arg=0x1)
at /usr/src/sys/kern/kern_intr.c:807
#6  0x080bab2f in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:271
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) c
Continuing.
Program received signal SIGINFO, Information request.
0x081d87f1 in globaldata_find (cpu=1)
at /usr/src/sys/platform/vkernel/platform/globaldata.c:61
61              KKASSERT(cpu >= 0 && cpu < ncpus);
(gdb) bt
#0  0x081d87f1 in globaldata_find (cpu=1)
at /usr/src/sys/platform/vkernel/platform/globaldata.c:61
#1  0x080bc2f2 in lwkt_process_ipiq () at /usr/src/sys/kern/lwkt_ipiq.c:473
#2  0x080bc4a1 in lwkt_send_ipiq3 (target=0x41400000, func=0x80b0a10
<systimer_intr>, arg1=0x8318acc, arg2=0) at /usr/src/sys/kern/lwkt_ipiq.c:189
#3  0x081de3fc in vktimer_intr (dummy=0x0, frame=0x0)
at /usr/src/sys/platform/vkernel/platform/systimer.c:218
#4  0x081d8871 in kqueue_intr (arg=0x0, frame=0x0)
at /usr/src/sys/platform/vkernel/platform/kqueue.c:201
#5  0x080913eb in ithread_handler (arg=0x1)
at /usr/src/sys/kern/kern_intr.c:807
#6  0x080bab2f in lwkt_deschedule_self (td=Cannot access memory at address 0x8
) at /usr/src/sys/kern/lwkt_thread.c:271
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
-- 
Regards,
Rumko
    
    
More information about the Bugs
mailing list