interrupt routing problem?

Wed Feb 9 12:29:41 PST 2005

In a message dated 2/9/2005 1:22:33 PM Eastern Standard Time, Matthew Dillon 
<dillon at xxxxxxxxxxxxxxxxxxxx> writes:

>
>:>BTW, none of this corrects the problem, as the controller stays locked
>:>up. The only thing found to always work to fix it is:
>:>
>:>ifconfig em0 down
>:>ifconfig em1 down
>:>ifconfig em0 up
>:>ifconfig em1 up
>:
>:Matt, do you have anything that I can look at to see what might be wrong
>:with the MB? I never got anyone in FreeBSD to give a hoot about it. The
>:info I have is:
>:
>:- at high speeds, the em transmit interface gets locked. Since this never
>:happens in device_polling mode, my assumption is that the interrupts
>:aren't working properly
>:- There are 2 on-board NICs and 2 NICS in a PCI-X slot. When passing
>:data through the 2 PCI-X slots, the lockup occurs within seconds. When
>:using the onboard NICs, it takes a long time, perhaps an hour, before
>:a problem occurs. The difference between the on-board NICs and
>:the PCI-X nics are that the onboard NICs are running in 32bit/33Mhz
>:mode while the PCI-X NICs are running 64bit/66Mhz mode.
>:- all NICs are em driver.
>:
>:I have to try linux on this machine. Its a supermicro MB and they
>:claim to have no info on problems with the hardware.
>
>    Hmm.  If the machine itself is not dying then there's a chance we 
>    can find the problem, but it isn't going to be easy.
>
>    The first thing to do is to try to narrow the problem down to a single
>    device, for example one of the PCI-X nics.  Recreate the problem and
>    then generate a crash dump of the machine by ctl-alt-esc'ing into the
>    debugger and panicing the box.  You may have to turn off buffer flushing
>    on panic (sysctl kern.sync_on_panic=0).
>
>    We currently have some issues with gdb'ing kernel cores that may make
>    it more difficult to track the problem down.  Joerg has been working on
>    them.
>

I can do that, but I'm not sure you understood properly. When I said "lock up"
I didnt mean that the system locked up, but just the controller. I can run
ifconfigs to revive the port, so I can add trace code or whatever is necessary
to look at things.

Since my last post I fired up linux and linux runs fine with no problems. The
interrrupt that both ports on the PCI-X slot is using is 9.

Also, why does dragonfly put stray interrupts into the vmstat output? Is 
that useful for something?