GSoC: Add SMT/HT awareness to DragonFlyBSD scheduler
Mihai Carabas
mihai.carabas at gmail.com
Wed Aug 1 07:01:00 PDT 2012
--f46d044787e58bce0804c634b98a
Content-Type: text/plain; charset=ISO-8859-1
Hi,
Is this correct?
>
>> e.g.: CHIP0 has cores 0,1,2,8,9,10 and so does CHIP1,
>> but there is no core labelled 3,4,5,6,7 and 11
>> also the cpu numbering seems 'staggered' to me -
>> e.g. chip0/core0/{cpu1,cpu13) and chip1/core0/{cpu0,cpu12}
>> rather than something more like:
>> chip0/core0-5/cpu0-11 & chip1/core6-11/cpu12-23
>>
> There is no standard on how the cpu's are numbered. The cpu1-cpu13,
> cpu0-cpu12,etc grouping of HT threads is ok because I tested the HT passive
> scheduling (the first part of my project) and it seems to give good
> results. The core numbers are extracted from the APICIDs (the core_bits and
> logical_bits are ok). We dumped the APICIDs with acpidump and they are ok.
> I haven't look much at this issue.
>
And here is the proof (sorry for the ugly printings):
xeon28# bash openssl_bench.sh 12 12
###### STARTING openssl speed rsa512 with kern.usched_bsd4.ht_enable=0
######
no_smt11 no_smt12 no_smt13 no_smt14 no_smt15
no_smt16 no_smt17 no_smt18 no_smt19
no_smt110 no_smt111 no_smt112
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.000098s 0.000160s 0.000160s 0.000160s 0.000160s
0.000160s 0.000099s 0.000161s 0.000160s
0.000098s 0.000160s 0.000098s
0.000095s 0.000095s 0.000155s 0.000155s 0.000095s
0.000095s 0.000095s 0.000156s 0.000095s
0.000095s 0.000095s 0.000156s
0.000095s 0.000095s 0.000095s 0.000095s 0.000095s
0.000095s 0.000155s 0.000095s 0.000156s
0.000155s 0.000095s 0.000156s
0.000155s 0.000155s 0.000095s 0.000155s 0.000095s
0.000095s 0.000095s 0.000095s 0.000155s
0.000155s 0.000096s 0.000155s
0.000155s 0.000095s 0.000155s 0.000097s 0.000155s
0.000155s 0.000155s 0.000155s 0.000095s
0.000095s 0.000095s 0.000095s
0.000155s 0.000095s 0.000155s 0.000155s 0.000155s
0.000155s 0.000095s 0.000096s 0.000156s
0.000095s 0.000095s 0.000096s
0.000155s 0.000095s 0.000155s 0.000096s 0.000095s
0.000095s 0.000095s 0.000154s 0.000095s
0.000095s 0.000095s 0.000156s
0.000155s 0.000095s 0.000155s 0.000155s 0.000095s
0.000095s 0.000095s 0.000096s 0.000095s
0.000095s 0.000095s 0.000155s
0.000155s 0.000095s 0.000155s 0.000155s 0.000095s
0.000095s 0.000155s 0.000095s 0.000096s
0.000095s 0.000156s 0.000156s
0.000155s 0.000095s 0.000155s 0.000095s 0.000095s
0.000155s 0.000096s 0.000095s 0.000155s
0.000095s 0.000095s 0.000096s
0.000156s 0.000095s 0.000095s 0.000154s 0.000154s
0.000152s 0.000095s 0.000095s 0.000154s
0.000156s 0.000095s 0.000095s
0.000155s 0.000095s 0.000155s 0.000095s 0.000095s
0.000096s 0.000096s 0.000095s 0.000156s
0.000155s 0.000095s 0.000095s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
###### STARTING openssl speed rsa512 with kern.usched_bsd4.ht_enable=1
######
smt11 smt12 smt13 smt14 smt15
smt16 smt17 smt18 smt19 smt110
smt111 smt112
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.000099s 0.000096s 0.000096s 0.000096s 0.000099s
0.000099s 0.000096s 0.000099s 0.000096s
0.000096s 0.000099s 0.000099s
0.000096s 0.000096s 0.000096s 0.000096s 0.000096s
0.000095s 0.000095s 0.000095s 0.000096s
0.000095s 0.000095s 0.000095s
0.000096s 0.000096s 0.000096s 0.000096s 0.000096s
0.000095s 0.000095s 0.000095s 0.000096s
0.000095s 0.000095s 0.000095s
0.000096s 0.000096s 0.000096s 0.000095s 0.000096s
0.000095s 0.000095s 0.000095s 0.000095s
0.000096s 0.000095s 0.000096s
0.000095s 0.000096s 0.000095s 0.000095s 0.000095s
0.000095s 0.000096s 0.000096s 0.000096s
0.000096s 0.000096s 0.000095s
0.000095s 0.000096s 0.000095s 0.000095s 0.000096s
0.000096s 0.000095s 0.000096s 0.000096s
0.000096s 0.000095s 0.000095s
0.000096s 0.000095s 0.000095s 0.000095s 0.000096s
0.000096s 0.000095s 0.000096s 0.000096s
0.000096s 0.000095s 0.000095s
0.000096s 0.000095s 0.000096s 0.000095s 0.000096s
0.000096s 0.000095s 0.000095s 0.000096s
0.000096s 0.000095s 0.000095s
0.000096s 0.000097s 0.000095s 0.000095s 0.000096s
0.000096s 0.000095s 0.000096s 0.000095s
0.000096s 0.000095s 0.000095s
0.000095s 0.000096s 0.000095s 0.000095s 0.000096s
0.000097s 0.000097s 0.000096s 0.000095s
0.000096s 0.000095s 0.000095s
0.000095s 0.000096s 0.000095s 0.000095s 0.000096s
0.000096s 0.000096s 0.000096s 0.000095s
0.000096s 0.000095s 0.000096s
0.000097s 0.000097s 0.000095s 0.000095s 0.000097s
0.000095s 0.000095s 0.000095s 0.000159s
0.000158s 0.000097s 0.000097s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
You see, that with HT enable, the results are more constant, than with no
HT. So the CPU topology is detected ok. The cache coherence heuristics are
not using the cpu topology. They are trying to schedule on the old CPU that
the process run. We are not searching through topology to find the best fit
(the process that has the closest old cpu). The main reason is we are in a
locked region, and if we do advanced searching, that region would become
very contented when in the systems are a lot of runable processes.I also
did some tests, and results got worse.
Mihai.
--f46d044787e58bce0804c634b98a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hi,<br><br><div>Is this correct?<br><div class=3D"gmail_quote"><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex"><div class=3D"gmail_quote"><div class=3D"im"><blockquote=
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex">
<br>
e.g.: CHIP0 has cores 0,1,2,8,9,10 and so does CHIP1,<br>
=A0 but there is no core labelled 3,4,5,6,7 and 11<br>
also the cpu numbering seems 'staggered' to me -<br>
=A0 e.g. chip0/core0/{cpu1,cpu13) and chip1/core0/{cpu0,cpu12}<br>
rather than something more like:<br>
=A0 chip0/core0-5/cpu0-11 & chip1/core6-11/cpu12-23<br></blockquote></d=
iv><div>There is no standard on how the cpu's are numbered. The cpu1-cp=
u13, cpu0-cpu12,etc grouping of HT threads is ok because I tested the HT pa=
ssive scheduling (the first part of my project) and it seems to give good r=
esults. The core numbers are extracted from the APICIDs (the core_bits and =
logical_bits are ok). We dumped the APICIDs with acpidump and they are ok. =
I haven't look much at this issue.</div>
</div></blockquote><div><br></div><div>And here is the proof (sorry for the=
ugly printings):</div><div><br></div><div>xeon28# bash openssl_bench.sh 12=
12</div><div>###### STARTING openssl speed rsa512 with kern.usched_bsd4.ht=
_enable=3D0 ######</div>
<div>no_smt11 =A0 =A0 =A0 =A0no_smt12 =A0 =A0 =A0 =A0no_smt13 =A0 =A0 =A0 =
=A0no_smt14 =A0 =A0 =A0 =A0no_smt15 =A0 =A0 =A0 =A0no_smt16 =A0 =A0 =A0 =A0=
no_smt17 =A0 =A0 =A0 =A0no_smt18 =A0 =A0 =A0 =A0no_smt19 =A0 =A0 =A0 =A0no_=
smt110 =A0 =A0 =A0 no_smt111 =A0 =A0 =A0 no_smt112</div><div>--------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
----------------------------------------</div>
<div>0.000098s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.00=
0160s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.000099s =A0=
=A0 =A0 0.000161s =A0 =A0 =A0 0.000160s =A0 =A0 =A0 0.000098s =A0 =A0 =A0 =
0.000160s =A0 =A0 =A0 0.000098s</div><div>0.000095s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000156s =A0 =A0 =A0 0.000=
095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000156s</div=
>
<div>0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0=
=A0 =A0 0.000095s =A0 =A0 =A0 0.000156s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000156s</div><div>0.000155s =A0 =A0 =A0 0.000155s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000=
155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000155s</div=
>
<div>0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.00=
0097s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0=
=A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000155s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =
=A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
156s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s</div=
>
<div>0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.00=
0096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0=
=A0 =A0 0.000154s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000156s</div><div>0.000155s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s</div=
>
<div>0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.00=
0155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000155s =A0=
=A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 =
0.000156s =A0 =A0 =A0 0.000156s</div><div>0.000155s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000155s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000=
155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s</div=
>
<div>0.000156s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0154s =A0 =A0 =A0 0.000154s =A0 =A0 =A0 0.000152s =A0 =A0 =A0 0.000095s =A0=
=A0 =A0 0.000095s =A0 =A0 =A0 0.000154s =A0 =A0 =A0 0.000156s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000155s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000=
156s =A0 =A0 =A0 0.000155s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s</div=
>
<div>----------------------------------------------------------------------=
---------------------------------------------------------------------------=
-----------------------------------------------------------</div><div><br>
</div><div><br></div><div>###### STARTING openssl speed rsa512 with kern.us=
ched_bsd4.ht_enable=3D1 ######</div><div>smt11 =A0 =A0 =A0 =A0 =A0 smt12 =
=A0 =A0 =A0 =A0 =A0 smt13 =A0 =A0 =A0 =A0 =A0 smt14 =A0 =A0 =A0 =A0 =A0 smt=
15 =A0 =A0 =A0 =A0 =A0 smt16 =A0 =A0 =A0 =A0 =A0 smt17 =A0 =A0 =A0 =A0 =A0 =
smt18 =A0 =A0 =A0 =A0 =A0 smt19 =A0 =A0 =A0 =A0 =A0 smt110 =A0 =A0 =A0 =A0 =
=A0smt111 =A0 =A0 =A0 =A0 =A0smt112</div>
<div>----------------------------------------------------------------------=
---------------------------------------------------------------------------=
-----------------------------------------------------------</div><div>0.000=
099s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =
=A0 =A0 0.000099s =A0 =A0 =A0 0.000099s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0=
.000099s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000099s =
=A0 =A0 =A0 0.000099s</div>
<div>0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.00=
0096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0=
=A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000096s =A0 =A0 =A0 0.000096s =
=A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000=
096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s</div=
>
<div>0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0=
=A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000096s</div><div>0.000095s =A0 =A0 =A0 0.000096s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =
=A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s</div=
>
<div>0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0=
=A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000096s =A0 =A0 =A0 0.000095s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =
=A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s</div=
>
<div>0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0=
=A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000096s =A0 =A0 =A0 0.000097s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =
=A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s</div=
>
<div>0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000097s =A0 =A0 =A0 0.000097s =A0=
=A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 =
0.000095s =A0 =A0 =A0 0.000095s</div><div>0.000095s =A0 =A0 =A0 0.000096s =
=A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s =A0 =A0 =
=A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000=
095s =A0 =A0 =A0 0.000096s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000096s</div=
>
<div>0.000097s =A0 =A0 =A0 0.000097s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.00=
0095s =A0 =A0 =A0 0.000097s =A0 =A0 =A0 0.000095s =A0 =A0 =A0 0.000095s =A0=
=A0 =A0 0.000095s =A0 =A0 =A0 0.000159s =A0 =A0 =A0 0.000158s =A0 =A0 =A0 =
0.000097s =A0 =A0 =A0 0.000097s</div><div>---------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------</div>
<div><br></div><div>=A0You see, that with HT enable, the results are more c=
onstant, than with no HT. So the CPU topology is detected ok. The cache coh=
erence heuristics are not using the cpu topology. They are trying to schedu=
le on the old CPU that the process run. We are not searching through topolo=
gy to find the best fit (the process that has the closest old cpu). The mai=
n reason is we are in a locked region, and if we do advanced searching, tha=
t region would become very contented when in the systems are a lot of runab=
le processes.I also did some tests, and results got worse.</div>
<div><br></div><div>Mihai.</div></div></div>
--f46d044787e58bce0804c634b98a--
More information about the Kernel
mailing list