<div dir="ltr">That doesn't make any sense. It sounds like it is just compressing more slowly, so there is less idle time because the HDD/SSD is able to keep up due to it compressing more slowly. You don't want to turn off hyperthreading in the BIOS and cache coherency stalls will not show up in the idle% anyway.<div><br></div><div>-Matt</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Dec 11, 2016 at 1:22 PM, PeerCorps Trust Fund <span dir="ltr"><<a href="mailto:ipc@peercorpstrust.org" target="_blank">ipc@peercorpstrust.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
It turns out that it was a combination of two things - turning off hyperthreading in BIOS and using a faster disk.<br>
<br>
I found a post from the author of lbzip2 which seems to describe what might be happening in this case, but reference was made to a user using an i5 mobile CPU:<br>
<br>
##############################<wbr>##############################<wbr>############<br>
<br>
"bzip2 author here. I strongly suspect that you see what you see because your Intel core i5 is probably only dual core PLUS hyper-threaded, not real quad-core. Meaning, you have two instances of the L2 per-core cache, not four, and each two hyperthreads share an L2 cache.<br>
<br>
Since the bzip2 compression/decompression is very cache sensitive (see "man bzip2"), the scaling factor will be determined mostly by how many OS-threads can dispose over a dedicated cache each. In your case this number is probably 2.<br>
<br>
Since you run two threads per core, those contend for the shared L2 cache, basically each messing with the other (flushing / invalidating the shared cache for the other). This contention shows up as double CPU time, because "waiting for cache" (or "waiting for main memory") is accounted for as CPU time.<br>
<br>
Hyperthreading is not useful but detrimental for lbzip2; so you should export LBZIP2="-n 2". You should not run more worker threads per core than: core-dedicated-cache-size divided by 8MB."<br>
<br>
##############################<wbr>##############################<wbr>############<br>
<br>
Running the compression again on the same file from an SSD with hyperthreading turned off, I was able to fully saturate all of the cores using lbzip2. None of this seemed obvious at first, but it rectified the situation. The biggest difference came from turning off hyperthreading (idle CPU - 20% vs the previous 90%) and then running from an SSD with hyperthreading turned off (idle CPU = 0%).<br>
<br>
Previously, the compression was run from a single HDD, not an SSD. Concerning the compression test using the same HDD under FreeBSD, well I don't know why it was able to saturate the CPU. Perhaps it has something to do with ZFS's aggressive caching. Turning that off and re-running the test would likely answer the question. Pixz performed similarly when the above two modifications were made.<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
<br>
On 12/11/2016 02:00 AM, Jasse Jansson wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Have you tried to disable hypertreads in the BIOS ???<br>
It's a long shot, I know, but it might help.<br>
<br>
On 2016-12-10 22:14, PeerCorps Trust Fund wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
On both systems HAMMER was used. One small correction concerning the 2c/2t machine, both compression programs did effectively utilize that CPU which had an idle % of 0.0. It is the bigger machine, 16c/32t where the CPU isn't effectively maxed out. I'll continue to try and investigate why and report back if I find anything.<br>
<br>
<br>
On 12/10/2016 10:26 PM, Justin Sherrill wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On the two DragonFly systems, was it Hammer or UFS? I would be<br>
surprised if that made a difference, but it might?<br>
<br>
On Sat, Dec 10, 2016 at 6:19 AM, PeerCorps Trust Fund<br>
<<a href="mailto:ipc@peercorpstrust.org" target="_blank">ipc@peercorpstrust.org</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
I've observed that parallel compression tools such as pixz and lbzip2 do not<br>
make use of all of the available CPU under Dragonfly. On other OSes, it<br>
does.<br>
<br>
When testing on a 50 gb file, using top I've observed that CPU idle<br>
percentages consistently hover around the 90% range for pixz and ~70% for<br>
lbzip2. These values under FreeBSD and Linux are typically ~0.0% idle until<br>
compression is complete. Correspondingly, compression takes significantly<br>
longer under Dragonfly, so the CPU is really being under utilized in this<br>
case as opposed to erroneous reporting by top.<br>
<br>
This was tested on two systems, one 16c/32t and a 2c/2t system on a recent<br>
master DragonFly v4.7.0.973.g8d7da-DEVELOPMENT #2: Wed Dec 7 11:44:04 EET<br>
2016.<br>
<br>
Has anyone else possibly observed this?<br>
<br>
--<br>
Mike<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
</blockquote>
</div></div></blockquote></div><br></div>