NATA update
Thomas E. Spanjaard
tgen at netphreax.net
Fri Dec 15 12:01:04 PST 2006
YONETANI Tomokazu wrote:
On Thu, Dec 14, 2006 at 09:13:17PM +0000, Thomas E. Spanjaard wrote:
YONETANI Tomokazu wrote:
On Tue, Dec 12, 2006 at 03:13:45PM +0000, Thomas E. Spanjaard wrote:
YONETANI Tomokazu wrote:
If I boot a UP kernel, it proceeds to "start_init: trying /sbin/init",
but then stuck there(the backtrace in DDB is from console handler).
The backtrace looks something like this:
Debugger(c02a1a77)
scgetc(c030e8a0,2,c017ba0b,0,c0306b40)
sckbdevent(c0306b40,0,c030e8a0)
atkbd_intr(c0306b40,0,cd682d84,c015d699,c0306b40)
atkbd_isa_intr(c0306b40,0)
ithread_handler(1,0,0,0,0)
lwkt_exit()
I fear this panic is unrelated, as Victor Balada Diaz is having the same
on his 1.6 system. His /sbin/init is stuck in nanosleep, and apparently
never jumped to.
No, that backtrace was not from a panic, that was when I press
ctrl+alt+esc after seeing "start_init: trying /sbin/init" message
and it stuck (ctrl+T didn't print anything). And `call dumpsys'
in DDB didn't start the dump, so I think /sbin/init wasn't even read
from the disk.
Then I tried setting `set hw.ata.ata_dma=0' in the boot driver, and
this time it made it to the login prompt
(updated: http://les.ath.cx/DragonFly/asrock-dmesg.boot)
But sometimes random commands(ls, sysctl, ...) dump core and fail.
Or ld command reports corruption of libraries when I try to build
a new kernel. On SMP kernel it happens more frequently. On UP kernel,
if I switch to a UDMAxx mode using natacontrol command, core dumping
occurs more frequently.
Hmm, I'm not seeing any corruptions (yet?) on my SCSI test system.
I just experienced something odd, perhaps similar to your experience
earlier. I (probably) experience a null deref when trying to open acd0c,
as you can see on http://deviate.fi/~tgen/mountroot_1.png . It appears
si_drv1 is NULL on line 218 in sys/dev/disk/nata/atapi-cd.c. Which is
strange, because in acd_attach() I really do set si_drv1 on acd0. And,
on the SCSI test system, I can open, read, write, etc /dev/acd0c without
problems. And the code was able to find acd_open(), so the dev_ops have
been registered, so it's not like it's passing the wrong device.
Therefore I suspect something somewhere is scribbling over si_drv1, but
I don't know where.
I haven't seen the panic in acd code, after your commit.
That was a different panic, due to faulty locking. This one is a new
beast. It only happens when you want to use a{,c}d as root device,
otherwise there's no problem. Somehow, si_drv1 of my cdev_t's is
scribbled over, and even when I recover their contents via
devclass_get_device(), still something is screwed up. See
http://deviate.fi/~tgen/vm_fault_1.png .
Cheers,
--
Thomas E. Spanjaard
tgen at netphreax.net
Attachment:
signature.asc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgp00027.pgp
Type: application/octet-stream
Size: 186 bytes
Desc: "Description: OpenPGP digital signature"
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20061215/65f87462/attachment-0020.obj>
More information about the Kernel
mailing list