Woot... reproduced NATA corruption! (was Re: Decision time.... should NATA become the default for this release?)
Matthew Dillon
dillon at apollo.backplane.com
Sat Jun 2 22:43:11 PDT 2007
I dug up a laptop with an ICH4 controller which negotiates at UDMA100.
atapci0: <Intel ICH4 UDMA100 controller> port 0x1810-0x181f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ad0: 38154MB <HITACHI DK13FA-40 00MCA0A6> at ata0-master UDMA100
ata1: <ATA channel 1> on atapci0
acd0: CDRW <UJDA755 DVD/CDRW/1.00> at ata1-master UDMA33
cd0 at ata1 bus 0 target 0 lun 0
I booted it with a NATA CD and ran this test:
(use a read-only mount so the access time doesn't update and mess up
the md5 check of the tar, and so the filesystem doesn't get destroyed
by corrupted access time updates to the inodes).
mount -o ro /dev/ad0s1a /mnt
tar cf - /mnt | md5
umount /mnt
mount -o ro /dev/ad0s1a /mnt
tar cf - /mnt | md5
umount /mnt
...
I get different MD5's!!! Definitely something isn't right here. I
also got a weird panic related to memory corruption.
I mounted a RW NFS filesystem and did the same thing with cpdup, then
diff -r'd the target directories on the NFS server, and they had
some serious corruption between them.
mount nfserver:/blah /tmp/mnt2
mount -o ro /dev/ad0s1a /mnt
cpdup /mnt /tmp/mnt2/test1
umount /mnt
mount -o ro /dev/ad0s1a /mnt
cpdup /mnt /tmp/mnt2/test2
umount /mnt
mount -o ro /dev/ad0s1a /mnt
cpdup /mnt /tmp/mnt2/test3
umount /mnt
(and on the server):
diff: copy2/etc/namedb/etc/namedb: recursive directory loop
diff: copy2/ftp: No such file or directory
diff: copy3/ftp: No such file or directory
Binary files copy2/kernel and copy3/kernel differ
Binary files copy2/kernel.NATA and copy3/kernel.NATA differ
Binary files copy2/kernel.bak and copy3/kernel.bak differ
Binary files copy2/kernel.old and copy3/kernel.old differ
Binary files copy2/modules/aac.ko and copy3/modules/aac.ko differ
Binary files copy2/modules/acpi.ko and copy3/modules/acpi.ko differ
Binary files copy2/modules/aha.ko and copy3/modules/aha.ko differ
Binary files copy2/modules/hpfs.ko and copy3/modules/hpfs.ko differ
Binary files copy2/modules/if_ar.ko and copy3/modules/if_ar.ko differ
Binary files copy2/modules/ispfw.ko and copy3/modules/ispfw.ko differ
Binary files copy2/modules/linux.ko and copy3/modules/linux.ko differ
Binary files copy2/modules/mga.ko and copy3/modules/mga.ko differ
Binary files copy2/modules/miibus.ko and copy3/modules/miibus.ko differ
Binary files copy2/modules/mlx.ko and copy3/modules/mlx.ko differ
Binary files copy2/modules/msdos.ko and copy3/modules/msdos.ko differ
Binary files copy2/modules/procfs.ko and copy3/modules/procfs.ko differ
Binary files copy2/modules/radeon.ko and copy3/modules/radeon.ko differ
Binary files copy2/modules/smbfs.ko and copy3/modules/smbfs.ko differ
Binary files copy2/modules/snd_cs4281.ko and copy3/modules/snd_cs4281.ko differ
Binary files copy2/modules/snd_ich.ko and copy3/modules/snd_ich.ko differ
Binary files copy2/modules/snd_pcm.ko and copy3/modules/snd_pcm.ko differ
Binary files copy2/modules/svr4.ko and copy3/modules/svr4.ko differ
Binary files copy2/modules/tdfx.ko and copy3/modules/tdfx.ko differ
Binary files copy2/modules/twe.ko and copy3/modules/twe.ko differ
Binary files copy2/modules/udf.ko and copy3/modules/udf.ko differ
Binary files copy2/modules/uhid.ko and copy3/modules/uhid.ko differ
Binary files copy2/modules/uvscom.ko and copy3/modules/uvscom.ko differ
Binary files copy2/modules.old/snd_pcm.ko and copy3/modules.old/snd_pcm.ko differ
Binary files copy2/modules.old/twa.ko and copy3/modules.old/twa.ko differ
Binary files copy2/old.ko and copy3/old.ko differ
Binary files copy2/root/intel_2200_wlan.zip and copy3/root/intel_2200_wlan.zip differ
Binary files copy2/sbin/dump and copy3/sbin/dump differ
Binary files copy2/sbin/rconfig and copy3/sbin/rconfig differ
Binary files copy2/sbin/rdump and copy3/sbin/rdump differ
Methinks there are some serious issues with NATA :-) Now that I can
reproduce the problem, I should be able to track it down. I'll start
working on that tomorrow, its getting a bit late tonight.
What's weird is that I have no problems at all running NATA on my
test box, which is
atapci0: <nVidia nForce3 Pro UDMA133 controller> port 0xf900-0xf90f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 8.0 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
acd0: CDRW <CD-W54E/1.1B> at ata1-master PIO4
atapci1: <nVidia nForce3 Pro SATA150 controller> port 0xf300-0xf37f,0xf400-0xf40f,0xb70-0xb73,0x970-0x977,0xbf0-0xbf3,0x9f0-0x9f7 irq 11 at device 10.0 on pci0
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
ad6: 194481MB <Maxtor 6L200M0 BACE1G20> at ata3-master SATA150
I haven't had any problems at all, which narrows the possible location
of the bug(s) considerably.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Kernel
mailing list