dual port EM nic wedging under load
Sepherosa Ziehau
sepherosa at gmail.com
Sun Nov 26 01:11:58 PST 2006
On 11/26/06, Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> wrote:
:After staring at the extremely low interrupt rate, I think something
:should be wrong with interrupt processing in our code. The loop
:mainly handles thingy that happens when we processing TX/RX desc
:rings. I think it is necessary when RX overrun happens, the extra ICR
:reading/processing may restore the RX engine, I don't think the extra
:ICR reading will hurt if ICR has nothing left
:
:...
:
:mmm, I forgot to take polling failure into account, polling does not
:even touch ICR ...
:
:The above patch is updated a little bit:
:http://leaf.dragonflybsd.org/~sephe/em_intr1.diff
:
:Best Regards,
:sephe
I don't know. I don't think this addressing the reason why polling
failed to work after the overrun occured. That is, we don't know why
polling failed to work.
I can think of two possibilities. First, the EM device is skipping an
entry in the receive ring (E1000_RXD_STAT_DD is not getting set), and
that is stopping all receive ring processing cold. Second, that when
a receiver overrun occurs the receive ring processes all the packets
but there is a bug in the handling of the ring index that confuses the
firmware into thinking that we did not clear all the ring buffers when
we did, for the case where the entire ring was full and the entire ring
is then cleared.
My suspicion, because the polling stops working, is the second case.
RDH and RDT (receiver head and tail descriptor pointers) are range
inclusive. On initialization the head is set to 0 and the tail is
set to num_rx_desc - 1. When we update it during receive packet
processing we set it to the index of the last processed index (which
is i - 1 because the index is advanced one past the last processed index).
--
I think your patch may have a problem... if we do not process *ANY*
receive frames in the loop your patch will end up adjusting RDT anyway,
to an incorrect value. It will set it to (the original)
next_rx_desc_to_check - 1. Oops!
The question here is what happens when a receiver overrun occurs? Clearly
when that case occurs ALL the receive frames will be full. Lets look at
a degenerate case:
[0 ...................... N-1]
* RDH set to 0
* RDT set to N-1
* N frames come in RDH is set to N-1 (??)
* We process N frames
* The frame at RING[N-1] is cleaned up
* i = N
* We set RDT to i-1 == N-1. It's the same value it was set to before
we processed all N frames. The receive engine will think that the
ring is still full when it is empty.
I Think what we need to do here is set RDT to N-2 (mod N of course) in
the case where we have processed ALL N frames. I'll bet the firmware
is getting confused in the overrun case because we are setting RDT to
the same value it was set at before. Very confused.
After staring at the rx processing code for a long time, I think I
found the problem:
in em_process_receive_interrupts
. ..
2936: if (em_get_buf(i, adapter, NULL, MB_DONTWAIT) == ENOBUFS) {
2937: adapter->dropped_pkts++;
2938: em_get_buf(i, adapter, mp, MB_DONTWAIT);
2939: if (adapter->fmp != NULL)
2940: m_freem(adapter->fmp);
2941: adapter->fmp = NULL;
2942: adapter->lmp = NULL;
2943: break;
2944: }
. ..
We will go into this condition when m_getcl(MB_DONTWAIT). if this
happened, then
1) we skipped the rest of the RX ring processing
2) next_rx_desc_to_check was not updated
3) RDT was not updated
RX engine would be sitting there, faced with an almost full rx ring
after the interrupt processing. This should lead to RX overrun and I
guess hardware may behave strange when this kind of things happened
(e.g. stall RX engine, which in turn stalls interrupts x_X)
As the output of hw.em0.debug_info=1, reported by Mike, the above
condition _did_ entered ("em0: Std mbuf cluster failed = 2", it is
adapter->mbuf_cluster_failed, which is updated if
em_get_buf():m_getcl() failed) during his benching.
Please review/test following patch:
http://leaf.dragonflybsd.org/~sephe/em_intr2.diff
Best Regards,
sephe
--
Live Free or Die
More information about the Users
mailing list