AMD cpu bug update #3 -- Official AMD reference now available
Matthew Dillon
dillon at apollo.backplane.com
Fri Mar 23 16:08:53 PDT 2012
AMD has officially posted the errata for the cpu bug I found! It is
errata #721 and can be found here:
http://support.amd.com/us/Processor_TechDocs/41322_10h_Rev_Gd.pdf
The errata includes a MSR workaround. I tested the MSR workaround and
it does appear to fix my test case, and I saw no discernable difference
in performance.
I would like to thank the folks at AMD who dilligently tracked the
bug down based on the test case I provided, and I would like to
thank all the supportive emails.
--
With the bug now known and, also, that a MSR workaround is available,
one of my developers will be posting a simplified test case here too.
He's been chomping at the bit :-) But I asked him to wait until AMD
posted their MSR workaround.
One last thing I would like to note: Because of the instant nature
of communication these days I was taken a bit by surprise by how quickly
misinformation about the bug spread. I take responsibility for this
because I simply did not post enough information in my original missive
after AMD confirmed the bug. We want cpu vendors to feel that they
can communicate with developers to the mutual benefit of both, so I
feel quite badly about it.
It is important that people keep in mind that there *IS* a MSR
workaround for this bug and that also, despite the fact that it occurs
using normal instruction sequences, this bug is quite difficult to
reproduce in real-life scenarios. It isn't just a simple sequence of
instructions... it requires a very deep recursion and particular stack
alignment. The stars have to align, basically.
We probably would have never found the bug if DragonFly hadn't had
user stack randomization turned on by default :-). And w/ respect to
GCC, we tend to use a mid-level of optimization rather than a high-level
of optimization and, frankly, I was never able to reproduce the bug with
any version of GCC other than the *particular* binary from late last
year, and only at particular starting stack offsets. We have never
observed this particular bug outside of the GCC test case and the
simplified test case that will soon be posted.
So, again:
* There is a MSR workaround (program MSR C001_1029 bit 0 to a 1)
* No discernable performance loss after programming the MSR
* All rev 10h cpus are effected, phenom and opteron.
* (not sure about 12h).
* Bulldozer is NOT affected by the bug.
Thank you all,
-Matt
More information about the Kernel
mailing list