Decision time.... should NATA become the default for this release?

Matthew Dillon dillon at apollo.backplane.com
Sat Jun 2 22:10:17 PDT 2007


:When PCI_MAP_FIXUP is specified, the related dmesg is:
:http://leaf.dragonflybsd.org/mailarchive/kernel/2007-06/msg00013.htm
:
:The errors when booting the NATA kernel with PCI_MAP_FIXUP option are:
:Mounting root from ufs:/dev/ad0s3a
:pid 2 (sh), uid 0: exited on signal 11
:Jun  2 06:15:26 init: /bin/sh on /etc/rc terminated abnormally, going to single
:user mode
:Enter full pathname of shell or RETURN for /bin/sh:
:pid 3 (sh), uid 0: exited on signal 11
:Jun  2 06:15:38 init: single user shell terminated, restarting
:Enter full pathname of shell or RETURN for /bin/sh:
:
:Best Regards,
:sephe

    And NATA without PCI_MAP_FIXUP, what happens?  You get errors, or just
    a lockup?

    I've made a bunch of commits.  They didn't fix Sascha's issue (which I
    think is the same as yours), so I don't think they will fix yours,
    but update anyway so we are all testing the same thing.

    You may have to do the same thing Sascha will be doing, which is to
    build a HEAD nrelease CD and boot the box with the CD.  It should build
    with a /kernel.NATA as well as a /kernel (generic).  Interrupt the
    CD boot sequence menu option 6, and then 'boot /kernel.NATA'.  Assuming
    the CD is able to boot, you can then run tests on the hard drive
    (only do read-only mounts, or dd from the device directly) to figure
    out what is causing the corruption.

    What I suggest to start with, assuming you can boot a NATA CD, is to
    do this:

	dd if=/dev/ad0 bs=32k count=1024 | md5
	dd if=/dev/ad0 bs=32k count=1024 | md5
	dd if=/dev/ad0 bs=32k count=1024 | md5
	dd if=/dev/ad0 bs=32k count=1024 | md5
	dd if=/dev/ad0 bs=32k count=1024 | md5

    Just to see if basic reading works.  Then try different block sizes,
    etc etc... move up from there.

    If you can find corruption, dd two somethings that will fit into a
    memory file and get them off the box.  On a working box hexdump and
    compare them to try to determine what kind of corruption is occuring.

    At the moment I've run out of ideas.  The chip registers are being
    set properly, so that pretty much leaves command initiation and
    interrupt timing issues.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>





More information about the Kernel mailing list