Comprehensive Threadripper tests - memory vs cpu freq at capped power

M. L. Wilson ipc at peercorpstrust.org
Thu Aug 23 13:17:29 PDT 2018


Hi,

The Threadripper discussion has been very interesting.

Will these adjustments to NUMA for Threadripper's topography also apply to Epyc also? Or will more nuanced work need to be done once an Epyc system becomes available?

M

On 08/20/2018 09:05 AM, Matthew Dillon wrote:
> It will depend on the frequency of the interconnect as well.  I think the
> idle power use comes out in my tests too... idle power consumption is
> around 83W with the fabric running at 2800 or 3000 MHz, and 65W when
> running at 2666 MHz or slower.  At idle the cpu frequency is at the same
> relative low baseline value for all but the lowest PPT power settings, so
> the difference can only be attributed to a combination of the infinity
> fabric and the (also mostly idle) DDR4.  I don't know why it has that step
> function between 2666 and 2800... I would have expect more of a linear
> scaling.  But it *is* nice that it doesn't step up until we get above 2666.
> 
> Of course, the infinity fabric is doing a lot more than just shuffling
> around memory requests.  It's also responsible for inter-CPU cache
> coherency management.  I expect it has to be powered up regardless of how
> idle the machine is.  The fabric has a nice simple name, but it is far from
> simple in reality.  Also, its really unclear in that article how Anandtech
> is measuring the infinity fabric's power consumption.  There might be some
> registers, but what they are actually measuring is not necessarily what
> they say they are measuring.
> 
> Just loading cpu threads does not necessarily load the fabric, it has to be
> some sort of memory intensive load and some loads are going to be far worse
> than others.  For example, loads which require a lot of cache management
> transactions will load the fabric down worse than loads which only need to
> access non-conflicting memory.  Anything that is computation-heavy will
> have a much lower load on the fabric at much higher load on the CPU, while
> anything that is memory-heavy will have a much higher load on the fabric
> and much lower load on the CPU.
> 
> I will power up my dual socket Xeon and check its idle, fortunately I have
> a second Kill-O-Watt meter.... lets see.  Ok, idle power consumption on my
> 2xXeon (total of 16 cores / 32 threads) is 98W at the wall plug.  This is
> actually considerably higher than the 2990WX's 65W (2666MHz memory or
> lower), and just a bit higher than the 2990WX's 83W w/2800 or 3000MHz
> memory.  The Xeon has 12 sticks of 2133 memory I believe.  Using the
> corepower module it breaks down as follows.  This makes sense to some
> degree because the 2xXeon has 12 memory channels and the threadripper only
> has 4.
> 
> cpu_node1.temp0              42.00 degC      OK           (node1 temp)
> cpu_node0.power0                24.36 W                   (node0 Package
> Power)
> cpu_node0.power1                38.60 W                   (node0 DRAM Power)
> cpu_node0.power2                 0.00 W                   (node0 Cores
> Power)
> cpu_node1.power0                18.90 W                   (node1 Package
> Power)
> cpu_node1.power1                16.78 W                   (node1 DRAM Power)
> cpu_node1.power2                 0.00 W                   (node1 Cores
> Power)
> 
> I'm currently running a long synth test (full bulk build of dports) on the
> threadripper with it set to 150W PPT with memory set to 2666 (220W at the
> wall from the table).  The synth test I ran with it at stock settings and
> 3000MHz memory took only 12 hours to run, which destroys the 22 hours it
> takes on the Xeon and the 18 hours it takes on the quad socket opteron.
> 
> During the 12 hour run the 2990WX pulled 330W or so from the wall.  The
> current run still in progress is pulling 230W at the wall with the synth
> load (10W higher than the simple compile loop test in the table).  I expect
> it will take longer than 12 hours to run, the question is... how much
> longer :-).   I really like the idea of being able to run the 2990WX at
> only 230W at the wall instead of 330W.  The 2xXeon at full load pulls
> around 200W at the wall.
> 
> This is also solidifying the speed memory I will buy for the 2990WX 'for
> real' (when I stuff it with 128G instead of 64G stolen from other
> machines)... will probably be 2666, maybe 2400, ECC.  But definitely not
> 2800 or 3000.
> 
> http://apollo.backplane.com/DFlyMisc/synth_times.txt    (see last entries,
> results are not really scientific because dports and compilers used are a
> moving target).
> 
> (note: current run results with power capped at 150W PPT - 230W at the
> wall, are not in yet)
> 
> -Matt
> 
> On Sun, Aug 19, 2018 at 7:52 PM, Samuel Paik <sam at paiks.org> wrote:
> 
>>
>> Apparently there are some special cpu registers you can read to get power
>> used by some components, probably not highly accurate but likely indicative.
>>
>>
>> Anandtech's review ( https://www.anandtech.com/show/13124/the-amd-
>> threadripper-2990wx-and-2950x-review/4 ) covered some of this, they found
>> the 2950WX infinity fabric was using 34 W at low load rising to 43 W at
>> higher load. At low load the interconnect was using more power than the cpu
>> cores.
>>
>>>
> 


More information about the Users mailing list