Fatal trap 19: non-maskable interrupt trap while in kernel mode

Hidetoshi Shimokawa simokawa at FreeBSD.org
Sun Nov 7 18:04:11 PST 2004


At Sun, 7 Nov 2004 12:21:59 -0800 (PST),
Matthew Dillon wrote:
> 
> 
> :During the boot, interrputs are disabled and this shouldn't be a problem.
> :And the interrupt vector is already initialized in fwohci_pci_attach()
> :before fwochi_init() is called.
> :It's a good idea that we should mask IT/IR interrupt before probing channels
> :but it should have nothing to do with this problem.
> :
> :As interrupts are disable during the boot, it must not be a usual interrupt
> :but a NMI. I think it's a PCI bus problem rather than RAM.
> :
> :Try the following patch,
> :
> :Index: fwohci_pci.c
> :===================================================================
> :RCS file: /home/dcvs/src/sys/bus/firewire/fwohci_pci.c,v
> :retrieving revision 1.15
> :diff -u -r1.15 fwohci_pci.c
> :--- fwohci_pci.c	18 Jul 2004 12:37:03 -0000	1.15
> :+++ fwohci_pci.c	7 Nov 2004 19:47:52 -0000
> :@@ -238,6 +238,7 @@
> : 		PCIM_CMD_SERRESPEN | PCIM_CMD_PERRESPEN;
> : #if 1
> : 	cmd &= ~PCIM_CMD_MWRICEN; 
> :+	cmd &= ~(PCIM_CMD_SERRESPEN | PCIM_CMD_PERRESPEN);
> : #endif
> : 	pci_write_config(self, PCIR_COMMAND, cmd, 2);
> :...
> :
> :I suppose their code doesn't enable above flags.
> :...
> :/\ Hidetoshi Shimokawa
> :\/  simokawa at xxxxxxxxxxx
> 
>     I think you've found it.  All the OpenBSD code does is enable the bus
>     master bit.  It doesn't touch any of the other bits.
> 
>     The original FreeBSD commit associated with this issue is:
> 
> >revision 1.20
> >date: 2003/03/24 03:47:36;  author: simokawa;  state: Exp;  lines: +6 -2
> >Safe PCI configuration.
> >- Clear PCIM_CMD_MWRICEN:
> >        some chips seem to have problem with write invalidate.
> >        clearing this bit fixes SBP timeout problem.
> >
> >Tested by: Michael Reifenberger <Michael.Reifenberger at xxxxxxxx>
> >
> >- Set PCIM_CMD_SERRESPEN and PCIM_CMD_PERRESPEN
> >- Moderate value for latency timer.
> 
>     He doesn't explain *WHY* he is turning on SERRESPEN and PERRESPEN. 
>     Generally, however, any device with its own on-board memory (as these
>     devices have) is subject to parity errors on the PCI bus if that
>     memory is not completely cleared on boot.  And that is what could be
>     happening here.

Hmm, PERR and SERR indicates PCI bus parity errors and other fatal errors.
I added it to detect broken hardwares. This is the first report of the error
I have ever got.

Are you sure it has something to do with clearing on-chip memory?
Do you know how to clear them?

>     Note that in his commit message he had to turn off write-invalidate.
>     That's a sure sign of on-chip parity checked memory not being initialized.

I thought PERR/SERR is independent of write-invalidate. Could you
explain more?

>     I will role another ISO with the change and post when it's ready.

Thanks,

/\ Hidetoshi Shimokawa
\/  simokawa at xxxxxxxxxxx





More information about the Kernel mailing list