AMD cpu bug update #3 -- Official AMD reference now available

Matthew Dillon dillon at apollo.backplane.com
Fri Mar 23 16:08:53 PDT 2012


    AMD has officially posted the errata for the cpu bug I found!  It is
    errata #721 and can be found here:

    http://support.amd.com/us/Processor_TechDocs/41322_10h_Rev_Gd.pdf

    The errata includes a MSR workaround.  I tested the MSR workaround and
    it does appear to fix my test case, and I saw no discernable difference
    in performance.

    I would like to thank the folks at AMD who dilligently tracked the
    bug down based on the test case I provided, and I would like to
    thank all the supportive emails.

    --

    With the bug now known and, also, that a MSR workaround is available,
    one of my developers will be posting a simplified test case here too.
    He's been chomping at the bit :-)  But I asked him to wait until AMD
    posted their MSR workaround.

    One last thing I would like to note:  Because of the instant nature
    of communication these days I was taken a bit by surprise by how quickly
    misinformation about the bug spread.  I take responsibility for this
    because I simply did not post enough information in my original missive
    after AMD confirmed the bug.  We want cpu vendors to feel that they
    can communicate with developers to the mutual benefit of both, so I
    feel quite badly about it.

    It is important that people keep in mind that there *IS* a MSR
    workaround for this bug and that also, despite the fact that it occurs
    using normal instruction sequences, this bug is quite difficult to
    reproduce in real-life scenarios.  It isn't just a simple sequence of
    instructions... it requires a very deep recursion and particular stack
    alignment.  The stars have to align, basically.

    We probably would have never found the bug if DragonFly hadn't had
    user stack randomization turned on by default :-).  And w/ respect to
    GCC, we tend to use a mid-level of optimization rather than a high-level
    of optimization and, frankly, I was never able to reproduce the bug with
    any version of GCC other than the *particular* binary from late last
    year, and only at particular starting stack offsets.  We have never
    observed this particular bug outside of the GCC test case and the
    simplified test case that will soon be posted.

    So, again:

	* There is a MSR workaround (program MSR C001_1029 bit 0 to a 1)
	* No discernable performance loss after programming the MSR
	* All rev 10h cpus are effected, phenom and opteron.
	* (not sure about 12h).
	* Bulldozer is NOT affected by the bug.

    Thank you all,

						-Matt






More information about the Kernel mailing list