Intel vs AMD DragonFly 2.11 parallel kernel build tests
dillon at apollo.backplane.com
Thu May 12 16:03:43 PDT 2011
Intel vs AMD DragonFly 2.11 parallel kernel build tests
PhenomIIx6 1090T 3.2GHz w/turbo, not overclocked (6 cores)
Intel-i7 2600K 3.4GHz w/turbo, not overclocked (4 cores, 2x HT)
Tests done with 64-bit kernel, sources fully cached in tmpfs
(i.e. no disk or network activity worth mentioning during the tests)
AMD Intel Test
---- ---- ----
71 seconds 50 seconds Buildkernel -j12 KERNCONF=X86_64_GENERIC \
183 seconds 144 seconds Buildkernel -j1 w/same parameters
unavail 33W Watt meter idle (note1)
unavail 92W Watt meter full load (buildkernel -j12) (note1)
247648K 569115K Openssl speed -elapsed -evp aes-128-cbc (note2)
108567K 110798K Openssl speed -elapsed -evp aes-128-ecb (note2)
unavail 6322 Mbits/s Cryptotest -a aes 102400 8192
135ns 184ns System call overhead getuid()
note 1: The i7 box I just built has a Seasonic gold (87%+ efficiency)
400W power supply in it and my PhenonIIx6 has a generic PSU
in it that's probably more around 75-80%. The phenomII box
eats around 50-60W idle but I don't know how much better it
would be with a good PSU in it, so grain-of-salt.
Still, 33W idle for a high-end Intel consumer box is very
note 2: aes-128-cbc on the intel uses the AESNI instructions available
on the SandyBridge. The -ecb test does not. The phenom II does
not have these instructions so we can see that cpu-bound core
logic loops are actually fairly close between the two cpus.
These tests were for 8192 byte buffers.
For the cpu tests I build the kernel core without modules, which is a
fully parallel build (building modules is not), so -j12 saturates all
available cpus and tests the turbo fallback.
The -j1 test effectively tests single-core performance for the same
workload, and being just one core it will presumably run at max-turbo
(both the AMD and Intel cpus in this test implement core turbo).
AMD Intel Simple memory bw test (/usr/src/test/mbwtest.c)
14 GByte/s 18 GByte/s L2
8.1 GByte/s 14 GBytes/s L3
5.2 GByte/s 11 GByte/s Main memory
Use DDR3-1333. Memory timings don't appear to make much of a difference
at all, even going from 9-9-9 to 7-7-7 on the i7 box.
The PhenomIIx7 box is also running w/ECC memory. There is no ECC option
available for Intel, but I don't think the difference would be all that
great and we already knew that Intel's memory bandwidth was very
impressive on the SandyBridge chips.
* The Intel-i7 2600K crushes the PhenomIIx6 1090T under full parallel
load (4 cores x 2 hyperthreads each vs 6 cores) by upwards of 30%.
* The Intel also beats the 1090T on the single-core load by 21%
* The Intel Sandybridge cpus have AESNI crypto instructions. The
first crypto test (aes-128-cbc) uses those instructions, the second
does not. Without the instructions the instruction loops running the
crypto logic are fairly close between AMD and Intel, and with the
instructions Intel is 2.3x faster.
Also note that this is per-cpu core, so we are talking approximately
6.3 GBits/sec x 4 (at least) for crypted disks, since DragonFly will
use multiple cores for the crypto.
* Sandybridge likely edges out AMD on power savings now, certainly the
33W idle consumption is very good. I don't have any good comparison
available there because my PSUs are different. With a crappy PSU the
AMD test box eats ~50-60W idle. But even if we give the PSU another
10% efficiency we are still talking 45W-54W. Intel is gonna beat it.
* The simple system call overhead test and the non-accelerated crypto
test shows that AMD does do well in some areas, but the crushing they
take in the compiler test shows the limitations of on-die caches.
* There is no point running I/O tests. AMD actually has better support
for 6GBit/sec SATA-III than Intel on their lower-end offerings from
a price standpoint. Either way today's modern cpus have no trouble
saturating even several SATA-III ports.
There is just one downside to the Intel-i7. Well, two if you count the
price. The downside that really gets my goat though is the lack of ECC
memory support on their consumer cpu line. I mean, COME ON INTEL! When
you stuff 16G of ram into a consumer box having ECC is probably going to
be a good idea. Gamers might not care, and most 'consumers' might not
notice, but anyone who cares about data integrity will care.
Other than that I would happily replace all my servers w/Sandybridge
today. As it stands though I don't actually need a ton of horsepower on
the servers. Our build boxes are the only things that really need the
horsepower of a Sandybridge. The reduced power consumption is very
provacative but it's a non-starter without ECC.
And AMD has saved me a ton of money over the years with their AM2+/AM3
socket compatibility. I've gone through three major generational cycles
on cpus with the same mobos just by buying a new cpu. Intel suffers
from too much socketmania and it gets expensive when you have to replace
the mobo, the memory, AND the cpu whenever you upgrade.
So for the moment I am willing to wait for AMD to come out with
something better. It doesn't have to beat Intel, but it does have to
get within shouting distance and 30% aint within shouting distance.
Even factoring in a current higher-end AMD cpu we still aren't going to
get more than another 7% improvement (23% is still too much). If AMD can
get within 15% in the next year or so I'll happily stick with them on
principle. But if they can't then I will grudgingly pay Intel's premium.
(And, p.s. this is why I invest in Intel and not AMD. Intel has the
monopoly and intentionally keeps AMD as a poor second cousin to keep
the anti-trust hounds at bay. Sorry AMD, I love you but I can only
support you in some ways :-( )
<dillon at backplane.com>
More information about the Users