git: kernel - Fix excessive ipiq recursion (3)
Matthew Dillon
dillon at crater.dragonflybsd.org
Fri Jul 22 18:48:14 PDT 2016
commit e47e3dba268d93966c871dfaa6cbbca79038d5bb
Author: Matthew Dillon <dillon at apollo.backplane.com>
Date: Fri Jul 22 18:22:17 2016 -0700
kernel - Fix excessive ipiq recursion (3)
* Third try. I'm not quite sure why we are still getting hard locks. These
changes (so far) appear to fix the problem, but I don't know why. It
is quite possible that the problem is still not fixed.
* Setting target->gd_npoll will prevent *all* other cpus from sending an
IPI to that target. This should have been ok because we were in a
critical section and about to send the IPI to the target ourselves, after
setting gd_npoll. The critical section does not prevent Xinvltlb, Xsniff,
Xspuriousint, or Xcpustop from running, but of these only Xinvltlb does
anything significant and it should theoretically run at a higher level
on all cpus than Xipiq (and thus complete without causing a deadlock of
any sort).
So in short, it should have been ok to allow something like an Xinvltlb
to interrupt the cpu inbetween setting target->gd_npoll and actually
sending the Xipiq to the target. But apparently it is not ok.
* Only clear mycpu->gd_npoll when we either (1) EOI and take the IPIQ
interrupt or (2) If the IPIQ is made pending via reqflags, when we clear
the flag. Previously we were clearing gd_npoll in the IPI processing
loop itself, potentially racing new incoming interrupts before they get
EOId by our cpu. This also should have been just fine, because interrupts
are enabled in the processing loop so nothing should have been able to
back-up in the LAPIC.
I can conjecture that possibly there was a race when we cleared gd_npoll
multiple times, potentially clearing it the second (or later) times,
allowing multiple incoming IPIs to be queued from multiple cpu sources but
then cli'ing and entering a e.g. Xinvltlb processing loop before our cpu
could acknowledge any of them. And then, possibly, trying to issue an IPI
with the system in this state.
I don't really see how this can cause a hard lock because I did not observe
any loop/counter error messages on the console which should have been
triggered if other cpus got stuck trying to issue IPIs. But LAPIC IPI
interactions are not well documented so... perhaps they were being issued
but blocked our local LAPIC from accepting a Xinvltlb due to having one
extra unacknowledged Xipiq pending? But then, our Xinvltlb processing loop
*does* enable interrupts for the duration, so it should have drained if
this were so.
In anycase, we no longer gratuitously clear gd_npoll in the processing
loop. We only clear it when we know there isn't one in-flight heading to
our cpu and none queued on our cpu. What will happen now is that a second
IPI can be sent to us once we've EOI'd the first one, and wind up in
reqflags, but will not be acted upon until our current processing loop
returns.
I will note that the gratuitous clearing we did before *could* have allowed
substantially all other cpus to try to Xipiq us at nearly the same time,
so perhaps the deadlock was related to that type of situation.
* When queueing an ipiq command from mycpu to a target, interrupts were
enabled between our entry into the ipiq fifo, the setting of our cpu bit
in the target gd_ipimask, the setting of target->gd_npoll, and our
issuing of the actual IPI to the target. We now disable interrupts across
these four steps.
It should have been ok for interrupts to have been left enabled across
these four steps. It might still be, but I am not taking any chances now.
Summary of changes:
sys/kern/lwkt_ipiq.c | 120 ++++++-----------------------
sys/platform/pc64/apic/apic_vector.s | 2 +
sys/platform/pc64/x86_64/genassym.c | 1 +
sys/platform/pc64/x86_64/global.s | 2 +
sys/platform/pc64/x86_64/ipl.s | 4 +
sys/platform/pc64/x86_64/mp_machdep.c | 32 +++-----
sys/platform/vkernel64/platform/machintr.c | 1 +
sys/platform/vkernel64/x86_64/exception.c | 1 +
sys/sys/thread.h | 3 -
sys/sys/thread2.h | 13 ----
10 files changed, 44 insertions(+), 135 deletions(-)
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/e47e3dba268d93966c871dfaa6cbbca79038d5bb
--
DragonFly BSD source repository
More information about the Commits
mailing list