nfs + msdosfs = crashes & panics
dillon at apollo.backplane.com
Tue Apr 13 15:25:06 PDT 2004
:OK, it looks like this part was a red herring. Reading the same set of
:files from a UFS-backed share caused the same lockup:
:The client has an ISA NIC, so (even though the server is far from a
:high-performance machine) I followed the suggestion in the FreeBSD
:handbook and added "-r=1024" to the client's fstab. Now, I can't seem
:to reproduce the lockup with either msdosfs or UFS backing, which is
:good, sorry for the distraction.
:As for writing, though, the bug looks genuine.
Ok, we'll set that aside. It could be a DMA issue or an MTU/large-packet
issue. Many so-called NFS lockups are due to bad cables that can't
handle large packets and NICs which cannot handle large packets or
many small side-by-side packets (or side by side packets just in general),
and so forth. It sounds like you may have a cable or NIC issue due
to getting a whole bunch of packet fragments all once. A standard NFS
packet has an 8K data payload which results in 5 or 6 physical packets
on the wire. There are many older (ISA primarily) NICs that just can't
:Writing files to the UFS-backed share does not cause the panic. But
:writing to an msdosfs-backed share does causes a "type 12 trap" which
:causes DDB to come up, and here's what I get on the console:
:kernel: type 12 trap, code=0
:Stopped at msdosfs_write+0x31: movl 0x34(%edx),%ebx
:msdosfs_write(c9274934) at msdosfs_write+0x31
:nfsrv_write(c0b48b08,c0b51a00,c481f800,c9274abc,0) at nfsrv_write+0x8e8
:nfssvc_nsfd(c9274b20,807d720,c481f800,0,c44bfa40) at nfssvc_nsfd+0x51a
:nfssvc(c9274c4c,4,0,0,c9274d20) at nfssvc+0x6dd
:syscall2(2f,2f,2f,0,0) at syscall2+0x24e
:Xint0x80_syscall() at Xint0x80_syscall+0x2a
:I can't seem to coax debug symbols out of gdb when I run it on the crash
:dump, even though I'm 100% certain my kernel config contains
:"makeoptions DEBUG=-g" (I saw -g passed along during the kernel build,
:too.) Here's what gdb does give me:
That works, but it does not install the debug-symboled binary. It
keeps it in your kernel build directory as 'kernel.debug'. If you
are using make buildkernel, then the kernel build directory is
probably /usr/obj/usr/src/sys/<KERNELNAME>. Note that the kernel.debug
must be from the exact build that was installed on the machine.
If you can upload the core and kernel.debug to leaf I'll take a look
at the crash.
:Fatal trap 12: page fault while in kernel mode
:fault virtual address = 0x34
:fault code = supervisor read, page not present
:instruction pointer = 0x8:0xc01c0009
It's obviously a null-pointer indirection of some sort.
:Not much point uploading the kernel image if it doesn't have symbols in
:it, but I do think we can rule out Jeffrey's network-related work here,
:it looks specifically like a nfs->msdosfs interaction gone bad. I'll
:try to look into the source myself later on, on the off chance there's
:some obvious mismatch between nfsrv_write() and msdosfs_write().
Find the kernel.debug.
<dillon at xxxxxxxxxxxxx>
More information about the Bugs