nfe(4) for nVidia GigE
Matthew Dillon
dillon at apollo.backplane.com
Sat Aug 26 21:59:59 PDT 2006
:Yeah, that's it!! Thank you very much!! I didn't figure out the real
:cause in the old version, but instead went to the sidetrack: adding a
:delay in nfe_encap() :-P
:
:Best Regards,
:sephe
It would be nice if we could determine which fix was the one that
fixed your MB. Insofar as I can tell there are three possible
causes for the watchdog timeouts.
(1) The hardware races the setting of NFE_TX_VALID in the second ring
buffer of a multi-buffer TX DMA. That is, the hardware is actively
transmitting a prior packet and the driver starts laying down the
new packet, and the hardware starts trying to transmit the new
packet before the driver can finish laying it down. This is due
to the driver improperly setting NFE_TX_VALID on the first ring
buffer in the new packet before finishing setting up all the ring
buffers.
Your delay had the effect of allowing the hardware to finish up
all the TX ring buffers and thus be quiescent when new packets
get queued, avoiding the race. Insofar as I can tell when you
KICK the hardware it runs TX ring buffers until it sees one
without NFE_TX_VALID set, then it goes quiescent until the next
KICK.
This is solved by the encap code fixes.
(2) The hardware fails to generate a TX completion interrupt. The
watchdog comes along and decides to reset the interface.
This is solved by the fixes in the watchdog code which first attempt
to drain the TX ring and then KICK it again before giving up and
resetting the interface (which doesn't solve the problem anyhow, it
appears). The KICK seemed to get TX completion interrupts working
again.
(3) The hardware interferes with other devices on the same IRQ. This
one really has me puzzled. I can't imagine how NFE can interfere
with TWA but it does! My system actually *LOST* an interrupt from
TWA and the I/O subsystem locked up on the disk. May this is an
interrupt routing issue of some sort. I don't quite understand how
the system can assign IRQ 10 to an external PCI card (TWA) *AND* also
the motherboard NFE interface. They are on two different PCI busses.
It should be impossible.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Submit
mailing list