Parallel compression not using all available CPU
Jasse Jansson
jasse at yberwaffe.com
Mon Dec 12 05:41:40 PST 2016
They recommend to turn hyperthreading off if you run studio software on
your computer.
That's if you run Windows, I have no idea if HT affects a Unix
derivative anyway.
On 2016-12-12 07:14, PeerCorps Trust Fund wrote:
> Thanks for this.
>
> Is there ever a side case where hyperthreading might have
> unpredictable results or should it generally always be left on?
>
> On 12/12/2016 01:37 AM, Matthew Dillon wrote:
>> That doesn't make any sense. It sounds like it is just compressing more
>> slowly, so there is less idle time because the HDD/SSD is able to
>> keep up
>> due to it compressing more slowly. You don't want to turn off
>> hyperthreading in the BIOS and cache coherency stalls will not show
>> up in
>> the idle% anyway.
>>
>> -Matt
>>
>> On Sun, Dec 11, 2016 at 1:22 PM, PeerCorps Trust Fund <
>> ipc at peercorpstrust.org> wrote:
>>
>>> Hi,
>>>
>>> It turns out that it was a combination of two things - turning off
>>> hyperthreading in BIOS and using a faster disk.
>>>
>>> I found a post from the author of lbzip2 which seems to describe what
>>> might be happening in this case, but reference was made to a user
>>> using an
>>> i5 mobile CPU:
>>>
>>> ########################################################################
>>>
>>>
>>> "bzip2 author here. I strongly suspect that you see what you see
>>> because
>>> your Intel core i5 is probably only dual core PLUS hyper-threaded,
>>> not real
>>> quad-core. Meaning, you have two instances of the L2 per-core cache,
>>> not
>>> four, and each two hyperthreads share an L2 cache.
>>>
>>> Since the bzip2 compression/decompression is very cache sensitive (see
>>> "man bzip2"), the scaling factor will be determined mostly by how many
>>> OS-threads can dispose over a dedicated cache each. In your case this
>>> number is probably 2.
>>>
>>> Since you run two threads per core, those contend for the shared L2
>>> cache,
>>> basically each messing with the other (flushing / invalidating the
>>> shared
>>> cache for the other). This contention shows up as double CPU time,
>>> because
>>> "waiting for cache" (or "waiting for main memory") is accounted for
>>> as CPU
>>> time.
>>>
>>> Hyperthreading is not useful but detrimental for lbzip2; so you should
>>> export LBZIP2="-n 2". You should not run more worker threads per
>>> core than:
>>> core-dedicated-cache-size divided by 8MB."
>>>
>>> ########################################################################
>>>
>>>
>>> Running the compression again on the same file from an SSD with
>>> hyperthreading turned off, I was able to fully saturate all of the
>>> cores
>>> using lbzip2. None of this seemed obvious at first, but it rectified
>>> the
>>> situation. The biggest difference came from turning off hyperthreading
>>> (idle CPU - 20% vs the previous 90%) and then running from an SSD with
>>> hyperthreading turned off (idle CPU = 0%).
>>>
>>> Previously, the compression was run from a single HDD, not an SSD.
>>> Concerning the compression test using the same HDD under FreeBSD,
>>> well I
>>> don't know why it was able to saturate the CPU. Perhaps it has
>>> something to
>>> do with ZFS's aggressive caching. Turning that off and re-running
>>> the test
>>> would likely answer the question. Pixz performed similarly when the
>>> above
>>> two modifications were made.
>>>
>>>
>>>
>>>
>>> On 12/11/2016 02:00 AM, Jasse Jansson wrote:
>>>
>>>> Have you tried to disable hypertreads in the BIOS ???
>>>> It's a long shot, I know, but it might help.
>>>>
>>>> On 2016-12-10 22:14, PeerCorps Trust Fund wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On both systems HAMMER was used. One small correction concerning the
>>>>> 2c/2t machine, both compression programs did effectively utilize
>>>>> that CPU
>>>>> which had an idle % of 0.0. It is the bigger machine, 16c/32t
>>>>> where the CPU
>>>>> isn't effectively maxed out. I'll continue to try and investigate
>>>>> why and
>>>>> report back if I find anything.
>>>>>
>>>>>
>>>>> On 12/10/2016 10:26 PM, Justin Sherrill wrote:
>>>>>
>>>>>> On the two DragonFly systems, was it Hammer or UFS? I would be
>>>>>> surprised if that made a difference, but it might?
>>>>>>
>>>>>> On Sat, Dec 10, 2016 at 6:19 AM, PeerCorps Trust Fund
>>>>>> <ipc at peercorpstrust.org> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've observed that parallel compression tools such as pixz and
>>>>>>> lbzip2
>>>>>>> do not
>>>>>>> make use of all of the available CPU under Dragonfly. On other
>>>>>>> OSes, it
>>>>>>> does.
>>>>>>>
>>>>>>> When testing on a 50 gb file, using top I've observed that CPU idle
>>>>>>> percentages consistently hover around the 90% range for pixz and
>>>>>>> ~70%
>>>>>>> for
>>>>>>> lbzip2. These values under FreeBSD and Linux are typically ~0.0%
>>>>>>> idle
>>>>>>> until
>>>>>>> compression is complete. Correspondingly, compression takes
>>>>>>> significantly
>>>>>>> longer under Dragonfly, so the CPU is really being under
>>>>>>> utilized in
>>>>>>> this
>>>>>>> case as opposed to erroneous reporting by top.
>>>>>>>
>>>>>>> This was tested on two systems, one 16c/32t and a 2c/2t system on a
>>>>>>> recent
>>>>>>> master DragonFly v4.7.0.973.g8d7da-DEVELOPMENT #2: Wed Dec 7
>>>>>>> 11:44:04
>>>>>>> EET
>>>>>>> 2016.
>>>>>>>
>>>>>>> Has anyone else possibly observed this?
>>>>>>>
>>>>>>> --
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>
>
More information about the Users
mailing list