NATA update

Fri Dec 15 12:01:04 PST 2006

YONETANI Tomokazu wrote:
On Thu, Dec 14, 2006 at 09:13:17PM +0000, Thomas E. Spanjaard wrote:
YONETANI Tomokazu wrote:
On Tue, Dec 12, 2006 at 03:13:45PM +0000, Thomas E. Spanjaard wrote:
YONETANI Tomokazu wrote:
If I boot a UP kernel, it proceeds to "start_init: trying /sbin/init",
but then stuck there(the backtrace in DDB is from console handler).
The backtrace looks something like this:
Debugger(c02a1a77)
scgetc(c030e8a0,2,c017ba0b,0,c0306b40)
sckbdevent(c0306b40,0,c030e8a0)
atkbd_intr(c0306b40,0,cd682d84,c015d699,c0306b40)
atkbd_isa_intr(c0306b40,0)
ithread_handler(1,0,0,0,0)
lwkt_exit()
I fear this panic is unrelated, as Victor Balada Diaz is having the same 
on his 1.6 system. His /sbin/init is stuck in nanosleep, and apparently 
never jumped to.
No, that backtrace was not from a panic, that was when I press
ctrl+alt+esc after seeing "start_init: trying /sbin/init" message
and it stuck (ctrl+T didn't print anything).  And `call dumpsys'
in DDB didn't start the dump, so I think /sbin/init wasn't even read
from the disk.
Then I tried setting `set hw.ata.ata_dma=0' in the boot driver, and
this time it made it to the login prompt
(updated: http://les.ath.cx/DragonFly/asrock-dmesg.boot)
But sometimes random commands(ls, sysctl, ...) dump core and fail.
Or ld command reports corruption of libraries when I try to build
a new kernel.  On SMP kernel it happens more frequently.  On UP kernel,
if I switch to a UDMAxx mode using natacontrol command, core dumping
occurs more frequently.
Hmm, I'm not seeing any corruptions (yet?) on my SCSI test system.

I just experienced something odd, perhaps similar to your experience 
earlier. I (probably) experience a null deref when trying to open acd0c, 
as you can see on http://deviate.fi/~tgen/mountroot_1.png . It appears 
si_drv1 is NULL on line 218 in sys/dev/disk/nata/atapi-cd.c. Which is 
strange, because in acd_attach() I really do set si_drv1 on acd0. And, 
on the SCSI test system, I can open, read, write, etc /dev/acd0c without 
problems. And the code was able to find acd_open(), so the dev_ops have 
been registered, so it's not like it's passing the wrong device. 
Therefore I suspect something somewhere is scribbling over si_drv1, but 
I don't know where.
I haven't seen the panic in acd code, after your commit.
That was a different panic, due to faulty locking. This one is a new 
beast. It only happens when you want to use a{,c}d as root device, 
otherwise there's no problem. Somehow, si_drv1 of my cdev_t's is 
scribbled over, and even when I recover their contents via 
devclass_get_device(), still something is screwed up. See 
http://deviate.fi/~tgen/vm_fault_1.png .

Cheers,
--
        Thomas E. Spanjaard
        tgen at netphreax.net
Attachment:
signature.asc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgp00027.pgp
Type: application/octet-stream
Size: 186 bytes
Desc: "Description: OpenPGP digital signature"
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20061215/65f87462/attachment-0015.obj>