nfe(4) for nVidia GigE

Matthew Dillon dillon at apollo.backplane.com
Sat Aug 26 21:59:59 PDT 2006


:Yeah, that's it!!  Thank you very much!!  I didn't figure out the real
:cause in the old version, but instead went to the sidetrack: adding a
:delay in nfe_encap() :-P
:
:Best Regards,
:sephe

    It would be nice if we could determine which fix was the one that
    fixed your MB.  Insofar as I can tell there are three possible
    causes for the watchdog timeouts.

    (1) The hardware races the setting of NFE_TX_VALID in the second ring
	buffer of a multi-buffer TX DMA.  That is, the hardware is actively
	transmitting a prior packet and the driver starts laying down the
	new packet, and the hardware starts trying to transmit the new
	packet before the driver can finish laying it down.  This is due
	to the driver improperly setting NFE_TX_VALID on the first ring
	buffer in the new packet before finishing setting up all the ring
	buffers.

	Your delay had the effect of allowing the hardware to finish up
	all the TX ring buffers and thus be quiescent when new packets
	get queued, avoiding the race.  Insofar as I can tell when you
	KICK the hardware it runs TX ring buffers until it sees one
	without NFE_TX_VALID set, then it goes quiescent until the next
	KICK.

	This is solved by the encap code fixes.

    (2) The hardware fails to generate a TX completion interrupt.  The
	watchdog comes along and decides to reset the interface.

	This is solved by the fixes in the watchdog code which first attempt
	to drain the TX ring and then KICK it again before giving up and
	resetting the interface (which doesn't solve the problem anyhow, it
	appears).  The KICK seemed to get TX completion interrupts working
	again.

    (3) The hardware interferes with other devices on the same IRQ. This
	one really has me puzzled.  I can't imagine how NFE can interfere
	with TWA but it does!  My system actually *LOST* an interrupt from
	TWA and the I/O subsystem locked up on the disk.  May this is an
	interrupt routing issue of some sort.  I don't quite understand how
	the system can assign IRQ 10 to an external PCI card (TWA) *AND* also
	the motherboard NFE interface.  They are on two different PCI busses.
	It should be impossible.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>





More information about the Submit mailing list