NATA update

Fri Dec 15 09:02:07 PST 2006

On Thu, Dec 14, 2006 at 09:13:17PM +0000, Thomas E. Spanjaard wrote:
> YONETANI Tomokazu wrote:
> >On Tue, Dec 12, 2006 at 03:13:45PM +0000, Thomas E. Spanjaard wrote:
> >>YONETANI Tomokazu wrote:
> >>>If I boot a UP kernel, it proceeds to "start_init: trying /sbin/init",
> >>>but then stuck there(the backtrace in DDB is from console handler).
> >>Could you put the dmesg and backtrace of the UP kernel online? The panic 
> >>backtrace from last week was from my faulty locking use in the device 
> >>timeout handling...
> >The backtrace looks something like this:
> >  Debugger(c02a1a77)
> >  scgetc(c030e8a0,2,c017ba0b,0,c0306b40)
> >  sckbdevent(c0306b40,0,c030e8a0)
> >  atkbd_intr(c0306b40,0,cd682d84,c015d699,c0306b40)
> >  atkbd_isa_intr(c0306b40,0)
> >  ithread_handler(1,0,0,0,0)
> >  lwkt_exit()
> >so I have no idea which kernel thread to show you.
> >I'll try to get the dmesg and the kernel dump tomorrow as I don't have
> >access to the machine in question right now(and I need to get some sleep 
> >:).
> 
> I fear this panic is unrelated, as Victor Balada Diaz is having the same 
> on his 1.6 system. His /sbin/init is stuck in nanosleep, and apparently 
> never jumped to.

No, that backtrace was not from a panic, that was when I press
ctrl+alt+esc after seeing "start_init: trying /sbin/init" message
and it stuck (ctrl+T didn't print anything).  And `call dumpsys'
in DDB didn't start the dump, so I think /sbin/init wasn't even read
from the disk.
Then I tried setting `set hw.ata.ata_dma=0' in the boot driver, and
this time it made it to the login prompt
(updated: http://les.ath.cx/DragonFly/asrock-dmesg.boot)

But sometimes random commands(ls, sysctl, ...) dump core and fail.
Or ld command reports corruption of libraries when I try to build
a new kernel.  On SMP kernel it happens more frequently.  On UP kernel,
if I switch to a UDMAxx mode using natacontrol command, core dumping
occurs more frequently.

> I just experienced something odd, perhaps similar to your experience 
> earlier. I (probably) experience a null deref when trying to open acd0c, 
> as you can see on http://deviate.fi/~tgen/mountroot_1.png . It appears 
> si_drv1 is NULL on line 218 in sys/dev/disk/nata/atapi-cd.c. Which is 
> strange, because in acd_attach() I really do set si_drv1 on acd0. And, 
> on the SCSI test system, I can open, read, write, etc /dev/acd0c without 
> problems. And the code was able to find acd_open(), so the dev_ops have 
> been registered, so it's not like it's passing the wrong device. 
> Therefore I suspect something somewhere is scribbling over si_drv1, but 
> I don't know where.

I haven't seen the panic in acd code, after your commit.

Cheers.