Woot... reproduced NATA corruption! (was Re: Decision time.... should NATA become the default for this release?)

Matthew Dillon dillon at apollo.backplane.com
Sat Jun 2 22:43:11 PDT 2007


    I dug up a laptop with an ICH4 controller which negotiates at UDMA100.

atapci0: <Intel ICH4 UDMA100 controller> port 0x1810-0x181f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ad0: 38154MB <HITACHI DK13FA-40 00MCA0A6> at ata0-master UDMA100
ata1: <ATA channel 1> on atapci0
acd0: CDRW <UJDA755 DVD/CDRW/1.00> at ata1-master UDMA33
cd0 at ata1 bus 0 target 0 lun 0

    I booted it with a NATA CD and ran this test:

    (use a read-only mount so the access time doesn't update and mess up
    the md5 check of the tar, and so the filesystem doesn't get destroyed
    by corrupted access time updates to the inodes).

    mount -o ro /dev/ad0s1a /mnt
    tar cf - /mnt | md5
    umount /mnt

    mount -o ro /dev/ad0s1a /mnt
    tar cf - /mnt | md5
    umount /mnt

    ...

    I get different MD5's!!!  Definitely something isn't right here.  I
    also got a weird panic related to memory corruption.

    I mounted a RW NFS filesystem and did the same thing with cpdup, then
    diff -r'd the target directories on the NFS server, and they had
    some serious corruption between them.

    mount nfserver:/blah /tmp/mnt2

    mount -o ro /dev/ad0s1a /mnt
    cpdup /mnt /tmp/mnt2/test1
    umount /mnt

    mount -o ro /dev/ad0s1a /mnt
    cpdup /mnt /tmp/mnt2/test2
    umount /mnt

    mount -o ro /dev/ad0s1a /mnt
    cpdup /mnt /tmp/mnt2/test3
    umount /mnt

    (and on the server):

    diff: copy2/etc/namedb/etc/namedb: recursive directory loop
    diff: copy2/ftp: No such file or directory
    diff: copy3/ftp: No such file or directory
    Binary files copy2/kernel and copy3/kernel differ
    Binary files copy2/kernel.NATA and copy3/kernel.NATA differ
    Binary files copy2/kernel.bak and copy3/kernel.bak differ
    Binary files copy2/kernel.old and copy3/kernel.old differ
    Binary files copy2/modules/aac.ko and copy3/modules/aac.ko differ
    Binary files copy2/modules/acpi.ko and copy3/modules/acpi.ko differ
    Binary files copy2/modules/aha.ko and copy3/modules/aha.ko differ
    Binary files copy2/modules/hpfs.ko and copy3/modules/hpfs.ko differ
    Binary files copy2/modules/if_ar.ko and copy3/modules/if_ar.ko differ
    Binary files copy2/modules/ispfw.ko and copy3/modules/ispfw.ko differ
    Binary files copy2/modules/linux.ko and copy3/modules/linux.ko differ
    Binary files copy2/modules/mga.ko and copy3/modules/mga.ko differ
    Binary files copy2/modules/miibus.ko and copy3/modules/miibus.ko differ
    Binary files copy2/modules/mlx.ko and copy3/modules/mlx.ko differ
    Binary files copy2/modules/msdos.ko and copy3/modules/msdos.ko differ
    Binary files copy2/modules/procfs.ko and copy3/modules/procfs.ko differ
    Binary files copy2/modules/radeon.ko and copy3/modules/radeon.ko differ
    Binary files copy2/modules/smbfs.ko and copy3/modules/smbfs.ko differ
    Binary files copy2/modules/snd_cs4281.ko and copy3/modules/snd_cs4281.ko differ
    Binary files copy2/modules/snd_ich.ko and copy3/modules/snd_ich.ko differ
    Binary files copy2/modules/snd_pcm.ko and copy3/modules/snd_pcm.ko differ
    Binary files copy2/modules/svr4.ko and copy3/modules/svr4.ko differ
    Binary files copy2/modules/tdfx.ko and copy3/modules/tdfx.ko differ
    Binary files copy2/modules/twe.ko and copy3/modules/twe.ko differ
    Binary files copy2/modules/udf.ko and copy3/modules/udf.ko differ
    Binary files copy2/modules/uhid.ko and copy3/modules/uhid.ko differ
    Binary files copy2/modules/uvscom.ko and copy3/modules/uvscom.ko differ
    Binary files copy2/modules.old/snd_pcm.ko and copy3/modules.old/snd_pcm.ko differ
    Binary files copy2/modules.old/twa.ko and copy3/modules.old/twa.ko differ
    Binary files copy2/old.ko and copy3/old.ko differ
    Binary files copy2/root/intel_2200_wlan.zip and copy3/root/intel_2200_wlan.zip differ
    Binary files copy2/sbin/dump and copy3/sbin/dump differ
    Binary files copy2/sbin/rconfig and copy3/sbin/rconfig differ
    Binary files copy2/sbin/rdump and copy3/sbin/rdump differ


    Methinks there are some serious issues with NATA :-)  Now that I can
    reproduce the problem, I should be able to track it down.  I'll start
    working on that tomorrow, its getting a bit late tonight.

    What's weird is that I have no problems at all running NATA on my 
    test box, which is 

atapci0: <nVidia nForce3 Pro UDMA133 controller> port 0xf900-0xf90f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 8.0 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
acd0: CDRW <CD-W54E/1.1B> at ata1-master PIO4
atapci1: <nVidia nForce3 Pro SATA150 controller> port 0xf300-0xf37f,0xf400-0xf40f,0xb70-0xb73,0x970-0x977,0xbf0-0xbf3,0x9f0-0x9f7 irq 11 at device 10.0 on pci0
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
ad6: 194481MB <Maxtor 6L200M0 BACE1G20> at ata3-master SATA150

    I haven't had any problems at all, which narrows the possible location
    of the bug(s) considerably.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list