Parallel compression not using all available CPU

PeerCorps Trust Fund ipc at peercorpstrust.org
Sun Dec 11 22:14:29 PST 2016


Thanks for this.

Is there ever a side case where hyperthreading might have unpredictable results or should it generally always be left on?

On 12/12/2016 01:37 AM, Matthew Dillon wrote:
> That doesn't make any sense.  It sounds like it is just compressing more
> slowly, so there is less idle time because the HDD/SSD is able to keep up
> due to it compressing more slowly.  You don't want to turn off
> hyperthreading in the BIOS and cache coherency stalls will not show up in
> the idle% anyway.
>
> -Matt
>
> On Sun, Dec 11, 2016 at 1:22 PM, PeerCorps Trust Fund <
> ipc at peercorpstrust.org> wrote:
>
>> Hi,
>>
>> It turns out that it was a combination of two things - turning off
>> hyperthreading in BIOS and using a faster disk.
>>
>> I found a post from the author of lbzip2 which seems to describe what
>> might be happening in this case, but reference was made to a user using an
>> i5 mobile CPU:
>>
>> ########################################################################
>>
>> "bzip2 author here. I strongly suspect that you see what you see because
>> your Intel core i5 is probably only dual core PLUS hyper-threaded, not real
>> quad-core. Meaning, you have two instances of the L2 per-core cache, not
>> four, and each two hyperthreads share an L2 cache.
>>
>> Since the bzip2 compression/decompression is very cache sensitive (see
>> "man bzip2"), the scaling factor will be determined mostly by how many
>> OS-threads can dispose over a dedicated cache each. In your case this
>> number is probably 2.
>>
>> Since you run two threads per core, those contend for the shared L2 cache,
>> basically each messing with the other (flushing / invalidating the shared
>> cache for the other). This contention shows up as double CPU time, because
>> "waiting for cache" (or "waiting for main memory") is accounted for as CPU
>> time.
>>
>> Hyperthreading is not useful but detrimental for lbzip2; so you should
>> export LBZIP2="-n 2". You should not run more worker threads per core than:
>> core-dedicated-cache-size divided by 8MB."
>>
>> ########################################################################
>>
>> Running the compression again on the same file from an SSD with
>> hyperthreading turned off, I was able to fully saturate all of the cores
>> using lbzip2. None of this seemed obvious at first, but it rectified the
>> situation. The biggest difference came from turning off hyperthreading
>> (idle CPU - 20% vs the previous 90%) and then running from an SSD with
>> hyperthreading turned off (idle CPU = 0%).
>>
>> Previously, the compression was run from a single HDD, not an SSD.
>> Concerning the compression test using the same HDD under FreeBSD, well I
>> don't know why it was able to saturate the CPU. Perhaps it has something to
>> do with ZFS's aggressive caching. Turning that off and re-running the test
>> would likely answer the question. Pixz performed similarly when the above
>> two modifications were made.
>>
>>
>>
>>
>> On 12/11/2016 02:00 AM, Jasse Jansson wrote:
>>
>>> Have you tried to disable hypertreads in the BIOS ???
>>> It's a long shot, I know, but it might help.
>>>
>>> On 2016-12-10 22:14, PeerCorps Trust Fund wrote:
>>>
>>>> Hi,
>>>>
>>>> On both systems HAMMER was used. One small correction concerning the
>>>> 2c/2t machine, both compression programs did effectively utilize that CPU
>>>> which had an idle % of 0.0. It is the bigger machine, 16c/32t where the CPU
>>>> isn't effectively maxed out. I'll continue to try and investigate why and
>>>> report back if I find anything.
>>>>
>>>>
>>>> On 12/10/2016 10:26 PM, Justin Sherrill wrote:
>>>>
>>>>> On the two DragonFly systems, was it Hammer or UFS?  I would be
>>>>> surprised if that made a difference, but it might?
>>>>>
>>>>> On Sat, Dec 10, 2016 at 6:19 AM, PeerCorps Trust Fund
>>>>> <ipc at peercorpstrust.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've observed that parallel compression tools such as pixz and lbzip2
>>>>>> do not
>>>>>> make use of all of the available CPU under Dragonfly. On other OSes, it
>>>>>> does.
>>>>>>
>>>>>> When testing on a 50 gb file, using top I've observed that CPU idle
>>>>>> percentages consistently hover around the 90% range for pixz and ~70%
>>>>>> for
>>>>>> lbzip2. These values under FreeBSD and Linux are typically ~0.0% idle
>>>>>> until
>>>>>> compression is complete. Correspondingly, compression takes
>>>>>> significantly
>>>>>> longer under Dragonfly, so the CPU is really being under utilized in
>>>>>> this
>>>>>> case as opposed to erroneous reporting by top.
>>>>>>
>>>>>> This was tested on two systems, one 16c/32t and a 2c/2t system on a
>>>>>> recent
>>>>>> master DragonFly v4.7.0.973.g8d7da-DEVELOPMENT #2: Wed Dec  7 11:44:04
>>>>>> EET
>>>>>> 2016.
>>>>>>
>>>>>> Has anyone else possibly observed this?
>>>>>>
>>>>>> --
>>>>>> Mike
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>



More information about the Users mailing list