GSoC: Add SMT/HT awareness to DragonFlyBSD scheduler

Mihai Carabas mihai.carabas at gmail.com
Wed Aug 1 07:01:00 PDT 2012


--f46d044787e58bce0804c634b98a
Content-Type: text/plain; charset=ISO-8859-1

Hi,

Is this correct?

>
>> e.g.: CHIP0 has cores 0,1,2,8,9,10 and so does CHIP1,
>>   but there is no core labelled 3,4,5,6,7 and 11
>> also the cpu numbering seems 'staggered' to me -
>>   e.g. chip0/core0/{cpu1,cpu13) and chip1/core0/{cpu0,cpu12}
>> rather than something more like:
>>   chip0/core0-5/cpu0-11 & chip1/core6-11/cpu12-23
>>
> There is no standard on how the cpu's are numbered. The cpu1-cpu13,
> cpu0-cpu12,etc grouping of HT threads is ok because I tested the HT passive
> scheduling (the first part of my project) and it seems to give good
> results. The core numbers are extracted from the APICIDs (the core_bits and
> logical_bits are ok). We dumped the APICIDs with acpidump and they are ok.
> I haven't look much at this issue.
>

And here is the proof (sorry for the ugly printings):

xeon28# bash openssl_bench.sh 12 12
###### STARTING openssl speed rsa512 with kern.usched_bsd4.ht_enable=0
######
no_smt11        no_smt12        no_smt13        no_smt14        no_smt15
     no_smt16        no_smt17        no_smt18        no_smt19
 no_smt110       no_smt111       no_smt112
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.000098s       0.000160s       0.000160s       0.000160s       0.000160s
    0.000160s       0.000099s       0.000161s       0.000160s
0.000098s       0.000160s       0.000098s
0.000095s       0.000095s       0.000155s       0.000155s       0.000095s
    0.000095s       0.000095s       0.000156s       0.000095s
0.000095s       0.000095s       0.000156s
0.000095s       0.000095s       0.000095s       0.000095s       0.000095s
    0.000095s       0.000155s       0.000095s       0.000156s
0.000155s       0.000095s       0.000156s
0.000155s       0.000155s       0.000095s       0.000155s       0.000095s
    0.000095s       0.000095s       0.000095s       0.000155s
0.000155s       0.000096s       0.000155s
0.000155s       0.000095s       0.000155s       0.000097s       0.000155s
    0.000155s       0.000155s       0.000155s       0.000095s
0.000095s       0.000095s       0.000095s
0.000155s       0.000095s       0.000155s       0.000155s       0.000155s
    0.000155s       0.000095s       0.000096s       0.000156s
0.000095s       0.000095s       0.000096s
0.000155s       0.000095s       0.000155s       0.000096s       0.000095s
    0.000095s       0.000095s       0.000154s       0.000095s
0.000095s       0.000095s       0.000156s
0.000155s       0.000095s       0.000155s       0.000155s       0.000095s
    0.000095s       0.000095s       0.000096s       0.000095s
0.000095s       0.000095s       0.000155s
0.000155s       0.000095s       0.000155s       0.000155s       0.000095s
    0.000095s       0.000155s       0.000095s       0.000096s
0.000095s       0.000156s       0.000156s
0.000155s       0.000095s       0.000155s       0.000095s       0.000095s
    0.000155s       0.000096s       0.000095s       0.000155s
0.000095s       0.000095s       0.000096s
0.000156s       0.000095s       0.000095s       0.000154s       0.000154s
    0.000152s       0.000095s       0.000095s       0.000154s
0.000156s       0.000095s       0.000095s
0.000155s       0.000095s       0.000155s       0.000095s       0.000095s
    0.000096s       0.000096s       0.000095s       0.000156s
0.000155s       0.000095s       0.000095s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


###### STARTING openssl speed rsa512 with kern.usched_bsd4.ht_enable=1
######
smt11           smt12           smt13           smt14           smt15
    smt16           smt17           smt18           smt19           smt110
         smt111          smt112
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.000099s       0.000096s       0.000096s       0.000096s       0.000099s
    0.000099s       0.000096s       0.000099s       0.000096s
0.000096s       0.000099s       0.000099s
0.000096s       0.000096s       0.000096s       0.000096s       0.000096s
    0.000095s       0.000095s       0.000095s       0.000096s
0.000095s       0.000095s       0.000095s
0.000096s       0.000096s       0.000096s       0.000096s       0.000096s
    0.000095s       0.000095s       0.000095s       0.000096s
0.000095s       0.000095s       0.000095s
0.000096s       0.000096s       0.000096s       0.000095s       0.000096s
    0.000095s       0.000095s       0.000095s       0.000095s
0.000096s       0.000095s       0.000096s
0.000095s       0.000096s       0.000095s       0.000095s       0.000095s
    0.000095s       0.000096s       0.000096s       0.000096s
0.000096s       0.000096s       0.000095s
0.000095s       0.000096s       0.000095s       0.000095s       0.000096s
    0.000096s       0.000095s       0.000096s       0.000096s
0.000096s       0.000095s       0.000095s
0.000096s       0.000095s       0.000095s       0.000095s       0.000096s
    0.000096s       0.000095s       0.000096s       0.000096s
0.000096s       0.000095s       0.000095s
0.000096s       0.000095s       0.000096s       0.000095s       0.000096s
    0.000096s       0.000095s       0.000095s       0.000096s
0.000096s       0.000095s       0.000095s
0.000096s       0.000097s       0.000095s       0.000095s       0.000096s
    0.000096s       0.000095s       0.000096s       0.000095s
0.000096s       0.000095s       0.000095s
0.000095s       0.000096s       0.000095s       0.000095s       0.000096s
    0.000097s       0.000097s       0.000096s       0.000095s
0.000096s       0.000095s       0.000095s
0.000095s       0.000096s       0.000095s       0.000095s       0.000096s
    0.000096s       0.000096s       0.000096s       0.000095s
0.000096s       0.000095s       0.000096s
0.000097s       0.000097s       0.000095s       0.000095s       0.000097s
    0.000095s       0.000095s       0.000095s       0.000159s
0.000158s       0.000097s       0.000097s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 You see, that with HT enable, the results are more constant, than with no
HT. So the CPU topology is detected ok. The cache coherence heuristics are
not using the cpu topology. They are trying to schedule on the old CPU that
the process run. We are not searching through topology to find the best fit
(the process that has the closest old cpu). The main reason is we are in a
locked region, and if we do advanced searching, that region would become
very contented when in the systems are a lot of runable processes.I also
did some tests, and results got worse.

Mihai.

--f46d044787e58bce0804c634b98a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,<br><br><div>Is this correct?<br><div class=3D"gmail_quote"><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex"><div class=3D"gmail_quote"><div class=3D"im"><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex">

<br>
e.g.: CHIP0 has cores 0,1,2,8,9,10 and so does CHIP1,<br>
=A0 but there is no core labelled 3,4,5,6,7 and 11<br>
also the cpu numbering seems 'staggered' to me -<br>
=A0 e.g. chip0/core0/{cpu1,cpu13) and chip1/core0/{cpu0,cpu12}<br>
rather than something more like:<br>
=A0 chip0/core0-5/cpu0-11 & chip1/core6-11/cpu12-23<br></blockquote></d=
iv><div>There is no standard on how the cpu's are numbered. The cpu1-cp=
u13, cpu0-cpu12,etc grouping of HT threads is ok because I tested the HT pa=
ssive scheduling (the first part of my project) and it seems to give good r=
esults. The core numbers are extracted from the APICIDs (the core_bits and =
logical_bits are ok). We dumped the APICIDs with acpidump and they are ok. =
I haven't look much at this issue.</div>
</div></blockquote><div><br></div><div>And here is the proof (sorry for the=
 ugly printings):</div><div><br></div><div>xeon28# bash openssl_bench.sh 12=
 12</div><div>###### STARTING openssl speed rsa512 with kern.usched_bsd4.ht=
_enable=3D0 ######</div>
<div>no_smt11 =A0 =A0 =A0 =A0no_smt12 =A0 =A0 =A0 =A0no_smt13 =A0 =A0 =A0 =
=A0no_smt14 =A0 =A0 =A0 =A0no_smt15 =A0 =A0 =A0 =A0no_smt16 =A0 =A0 =A0 =A0=
no_smt17 =A0 =A0 =A0 =A0no_smt18 =A0 =A0 =A0 =A0no_smt19 =A0 =A0 =A0 =A0no_=
smt110 =A0 =A0 =A0 no_smt111 =A0 =A0 =A0 no_smt112</div><div>--------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
----------------------------------------</div>
<div>0.000098s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.00=
0160s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.000099s =A0=
 =A0 =A0 0.000161s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.000098s =A0 =A0 =A0 =
0.000160s =A0 =A0 =A0 0.000098s</div><div>0.000095s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000156s =A0 =A0 =A0 0.000=
095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000156s</div=
>
<div>0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0=
 =A0 =A0 0.000095s =A0 =A0 =A0 0.000156s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000156s</div><div>0.000155s =A0 =A0 =A0 0.000155s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000=
155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000155s</div=
>
<div>0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.00=
0097s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0=
 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000155s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =
=A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
156s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s</div=
>
<div>0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.00=
0096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0=
 =A0 =A0 0.000154s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000156s</div><div>0.000155s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s</div=
>
<div>0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.00=
0155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0=
 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 =
0.000156s =A0 =A0 =A0 0.000156s</div><div>0.000155s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000155s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000=
155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s</div=
>
<div>0.000156s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0154s =A0 =A0 =A0 0.000154s =A0 =A0 =A0 0.000152s =A0 =A0 =A0 0.000095s =A0=
 =A0 =A0 0.000095s =A0 =A0 =A0 0.000154s =A0 =A0 =A0 0.000156s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000155s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000=
156s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s</div=
>
<div>----------------------------------------------------------------------=
---------------------------------------------------------------------------=
-----------------------------------------------------------</div><div><br>
</div><div><br></div><div>###### STARTING openssl speed rsa512 with kern.us=
ched_bsd4.ht_enable=3D1 ######</div><div>smt11 =A0 =A0 =A0 =A0 =A0 smt12 =
=A0 =A0 =A0 =A0 =A0 smt13 =A0 =A0 =A0 =A0 =A0 smt14 =A0 =A0 =A0 =A0 =A0 smt=
15 =A0 =A0 =A0 =A0 =A0 smt16 =A0 =A0 =A0 =A0 =A0 smt17 =A0 =A0 =A0 =A0 =A0 =
smt18 =A0 =A0 =A0 =A0 =A0 smt19 =A0 =A0 =A0 =A0 =A0 smt110 =A0 =A0 =A0 =A0 =
=A0smt111 =A0 =A0 =A0 =A0 =A0smt112</div>
<div>----------------------------------------------------------------------=
---------------------------------------------------------------------------=
-----------------------------------------------------------</div><div>0.000=
099s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =
=A0 =A0 0.000099s =A0 =A0 =A0 0.000099s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0=
.000099s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000099s =
=A0 =A0 =A0 0.000099s</div>
<div>0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.00=
0096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0=
 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000096s =A0 =A0 =A0 0.000096s =
=A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000=
096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s</div=
>
<div>0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0=
 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000096s</div><div>0.000095s =A0 =A0 =A0 0.000096s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s</div=
>
<div>0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0=
 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000096s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =
=A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s</div=
>
<div>0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0=
 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000096s =A0 =A0 =A0 0.000097s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =
=A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s</div=
>
<div>0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000097s =A0 =A0 =A0 0.000097s =A0=
 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000095s =A0 =A0 =A0 0.000096s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =
=A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s</div=
>
<div>0.000097s =A0 =A0 =A0 0.000097s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000097s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0=
 =A0 =A0 0.000095s =A0 =A0 =A0 0.000159s =A0 =A0 =A0 0.000158s =A0 =A0 =A0 =
0.000097s =A0 =A0 =A0 0.000097s</div><div>---------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------</div>
<div><br></div><div>=A0You see, that with HT enable, the results are more c=
onstant, than with no HT. So the CPU topology is detected ok. The cache coh=
erence heuristics are not using the cpu topology. They are trying to schedu=
le on the old CPU that the process run. We are not searching through topolo=
gy to find the best fit (the process that has the closest old cpu). The mai=
n reason is we are in a locked region, and if we do advanced searching, tha=
t region would become very contented when in the systems are a lot of runab=
le processes.I also did some tests, and results got worse.</div>
<div><br></div><div>Mihai.</div></div></div>

--f46d044787e58bce0804c634b98a--





More information about the Kernel mailing list