DragonFly master now runs on the Threadripper 2990WX

Matthew Dillon dillon at backplane.com
Sat Aug 18 00:11:21 PDT 2018


Well, the DFly scheduler *is* NUMA aware, but it kinda expects symmetric
memory assignments and TR2 is asymmetric (that is, only two of the four
dies have memory controllers hanging off of them directly).  So the DFly
scheduler will probably need some minor tweeks to understand that and
prioritize the cores with more direct access to their memory for light
loads.

I'm guessing it will take Microsoft a month or two to fix their scheduler.
Maybe longer.  It's fairly cache unfriendly (which is why MS keeps
recommending that programs pin threads to particular cpu's... but that's
defnitely a cop-out on Microsoft's part).

-Matt

On Fri, Aug 17, 2018 at 10:56 PM, Freddie Cash <fjwcash at gmail.com> wrote:

> On Fri, Aug 17, 2018, 9:43 PM Matthew Dillon, <dillon at backplane.com>
> wrote:
>
>> A few minor commits and DragonFly master is now able to run on the new
>> threadripper.  The cpu is a real beast, packing 32 cores and 64 threads.
>> It blows away our dual-core Xeon to the tune of being +50% faster in
>> concurrent compile tests, and it also blows away our older 4-socket Opteron
>> (which we call 'Monster') by about the same margin.  It's an impressive CPU.
>>
>> For now the new beast is going to be used to help us improve I/O
>> performance through the filesystem, further SMP work (but DFly scales
>> pretty well to 64 threads already), and perhaps some driver to work to
>> support the 10gbe on the mobo.  Fortunately the mobo I have also has two
>> 1gig NICs that we already support well).
>>
>> -Matt
>>
>
> Have you done any work on the scheduler to work with the very NUMA nature
> of the Threadripper 2 WX models?
>
> 16 of the CPU cores have direct access to the memory controller and PCIe
> lanes, while the other 16 do not, increasing the latency for any memory/bus
> accesses.
>
> Workloads that rely more on CPU resources than memory scale really well on
> the 2990WX/2950WX. But those that rely on memory or disk I/O will depend on
> the intelligence of the scheduler for their scaling.
>
> This is one of the reasons Windows runs slower than Linux on the
> 2950WX/2990WX. The Windows scheduler treats all the cores the same and just
> does a round-robin to available cores. They've added some basic weighting
> to the two types of cores, but haven't done anywhere near the taking the
> Linux devs have.
>
> Will be interesting to watch how benchmark scores change after every patch
> Tuesday, as the Windows devs tweak the scheduler to work with the different
> cores in the TR2.
>
> Cheers,
> Freddie
>
> Typos courtesy of my phone's keyboard.
>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dragonflybsd.org/pipermail/users/attachments/20180818/5953db1a/attachment.htm>


More information about the Users mailing list