nfs + msdosfs = crashes & panics

Chris Pressey cpressey at catseye.mine.nu
Tue Apr 13 15:13:07 PDT 2004


On Tue, 13 Apr 2004 12:32:02 -0700 (PDT)
Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:

>     I don't know whether this is or is not related to Jeff's recent
>     work and the open bug report from YONETANI Tomokazu.  I suspect
>     that it is not related.
> 
>     I don't recall us ever supporting anything other then UFS on the
>     server side.  On the otherhand, theoretically, it should be
>     possible to serve an MSDOSFS partition via NFS, and I would like
>     that support to work.

A brief google of 'nfs msdosfs' returns pages that imply that people
have used this combination under NetBSD without problems (well, without
problems that are this serious, anyway.)

>     Go ahead and post a backtrace of the crash that occurs when you
>     try to write.  Also:
> 
>     * make sure you are using a UDP NFS mount, with no special
>     options.

Affirmative.

>     * make sure the export is read-write (no '-ro' option in the
>     /etc/exports line).

Affirmative.

>     * make sure that NFS exports of normal UFS filesystems work
>     normally
>       (do not lock up or crash), which will help us narrow it down to
>       the probability that it's the MSDOS filesystem that is creating
>       an issue.

OK, it looks like this part was a red herring.  Reading the same set of
files from a UFS-backed share caused the same lockup:

The client has an ISA NIC, so (even though the server is far from a
high-performance machine) I followed the suggestion in the FreeBSD
handbook and added "-r=1024" to the client's fstab.  Now, I can't seem
to reproduce the lockup with either msdosfs or UFS backing, which is
good, sorry for the distraction.

As for writing, though, the bug looks genuine.

Writing files to the UFS-backed share does not cause the panic.  But
writing to an msdosfs-backed share does causes a "type 12 trap" which
causes DDB to come up, and here's what I get on the console:


kernel: type 12 trap, code=0
Stopped at	msdosfs_write+0x31:	movl	0x34(%edx),%ebx
db> trace
msdosfs_write(c9274934) at msdosfs_write+0x31
nfsrv_write(c0b48b08,c0b51a00,c481f800,c9274abc,0) at nfsrv_write+0x8e8
nfssvc_nsfd(c9274b20,807d720,c481f800,0,c44bfa40) at nfssvc_nsfd+0x51a
nfssvc(c9274c4c,4,0,0,c9274d20) at nfssvc+0x6dd
syscall2(2f,2f,2f,0,0) at syscall2+0x24e
Xint0x80_syscall() at Xint0x80_syscall+0x2a
db> panic
panic: from debugger
Debugger("panic")
Fatal trap 3: breakpoint instruction fault while in kernel mode
[...]
Stopped at	msdosfs_write+0x31:	movl	0x34(%edx),%ebx
db> [hit enter here]
panic: from debugger
Uptime: 8m22s
dumping [...]


I can't seem to coax debug symbols out of gdb when I run it on the crash
dump, even though I'm 100% certain my kernel config contains
"makeoptions DEBUG=-g" (I saw -g passed along during the kernel build,
too.)  Here's what gdb does give me:


catbus#	gdb -k /var/crash/kernel.2 /var/crash/vmcore.2
GNU gdb 4.18 (FreeBSD)
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-unknown-dragonfly"...
(no debugging symbols found)...
1 cpu [ff800000,32768]
IdlePTD at phsyical address 0x00417000
PCB @00364b40 EIP=c017937e ESP=c9274680 EBP=c927468c
initial pcb at physical address 0x00364b40
panicstr: from debugger
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x34
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc01c0009
stack pointer	        = 0x10:0xc927486c
frame pointer	        = 0x10:0xc92748b8
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 456 (nfsd)
current thread          = pri 10 
interrupt mask		= none
panic: from debugger


Fatal trap 3: breakpoint instruction fault while in kernel mode
instruction pointer	= 0x8:0xc02aed9c
stack pointer	        = 0x10:0xc9274680
frame pointer	        = 0x10:0xc9274688
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, IOPL = 0
current process		= 456 (nfsd)
current thread          = pri 10 
interrupt mask		= none


Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x34
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc01c0009
stack pointer	        = 0x10:0xc927486c
frame pointer	        = 0x10:0xc92748b8
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 456 (nfsd)
current thread          = pri 10 
interrupt mask		= none
panic: from debugger
Uptime: 2h5m8s

dumping to dev #ad/0x20009, offset 393248
dump ata0: resetting devices .. done
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40
39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  0xc017937e in dumpsys ()
(kgdb) bt
#0  0xc017937e in dumpsys ()
#1  0xc0179173 in boot ()
#2  0xc01795a4 in poweroff_wait ()
#3  0xc0129989 in db_panic ()
#4  0xc0129927 in db_command ()
#5  0xc01299f3 in db_command_loop ()
#6  0xc012bfb1 in db_trap ()
#7  0xc02aea66 in kdb_trap ()
#8  0xc02bfe8e in trap_fatal ()
#9  0xc02bf90f in trap ()
#10 0xc01c0009 in msdosfs_write ()
#11 0xc0209488 in nfsrv_write ()
#12 0xc021da8a in nfssvc_nfsd ()
#13 0xc021d385 in nfssvc ()
#14 0xc02c01d6 in syscall2 ()
#15 0xc02afa6a in Xint0x80_syscall ()
Cannot access memory at address 0xbfbffda0.
(kgdb) quit


Not much point uploading the kernel image if it doesn't have symbols in
it, but I do think we can rule out Jeffrey's network-related work here,
it looks specifically like a nfs->msdosfs interaction gone bad.  I'll
try to look into the source myself later on, on the off chance there's
some obvious mismatch between nfsrv_write() and msdosfs_write().

-Chris





More information about the Bugs mailing list