git: kernel - SMP - "Fix AP #%d (PHY# %d) failed" issues
Matthew Dillon
dillon at apollo.backplane.com
Wed Feb 10 16:52:31 PST 2010
:Ok, that's nuts. Completely nuts. I'm surprised that you could even
:debug it and get a reliable method to work around this. I'm very
:surprised that an SMI interrupt is depending on or changing state in
:such a way that things simply hang. They're supposed to be transparent,
:other than userland "loosing" some time...
:
:Yuck, yuck, yuck.
:
:Possibly we only need the delay on "old" style SMP boxes with external
:APIC's? IE: on new hardware with the stuff on-chip, we may be able to
:get away with a much smaller delay in general?
:
:-Toby.
I don't know. Typically a cpu has to be held in RESET long
enough for its internal clearing state machine to run, which can
involve microcode too. But today's cpus are so blasted fast I'd
be surprised if that took anywhere near as long as it would have
on the old Pentiums.
If no SMI interrupt is detected the code will use the 10ms spec.
That's an attempt to not break older pre-USB-keyboard machines
and BIOSes with a delay that is too short.
In testing on my Shuttle no SMI interrupt appeared to be running
if I unplugged the USB keyboard before boot, or if the keyboard
is plugged into certain ports (but not others). It is very
weird.
Yah, I had to reboot the poor shuttle box about 50 times to test
various strategies.
My guess is that Microsoft has code to actually disable the SMI
interrupt during AP startup. There is some code floating around
FreeBSD to do that for certain Intel chipsets for MacBooks but
it isn't universal. My Phenom-based shuttles exhibited the behavior
and they certainly aren't MacBooks. Insofar as I know there is
no standard way. Theoretically disabling the LAPIC entirely will
kill the SMI interrupt but that didn't seem to work when I tried
it (disabling it around the long 10ms sleep and then reenabling
it).
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Commits
mailing list