Analysis on make parallelism for buildworld

Simon 'corecode' Schubert corecode at fs.ei.tum.de
Tue Oct 20 09:15:00 PDT 2009


Hey,

the question on which make parallelism to use comes up repeatedly.  However the answer usually is driven by anecdotal evidence and not by empirical data.  To this end, I ran a small benchmark test to add one data point.  I have no idea about confidence intervals, so somebody will have to chime in here.

Experimental setup
==================
Machine: Dell Precision T3400
CPU: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz (2826.24-MHz 686-class CPU)
Memory: avail memory = 2063409152 (2015048K bytes)
HDD: da0: <SATA Hitachi HDP72505 GM4O> Fixed Direct Access SCSI-4 device (via AHCI)
filesystem: HAMMER v2
/usr/src: v2.5.1-77-gd894b0e
/usr/obj: flags nohistory, nullfs mount
executed command: make -j $j_level buildworld buildkernel

make levels used: 1-10
repetitions: 5
There were no other tasks performed during the tests, although Xorg, windowmaker, terminals, xmms, firefox and thunderbird were running (idling).  Standard background jobs were not disabled.

Discussion
==========
The plot shows the median build time as line and the errorbars show the min/max build times.  The max spike at -j4 is probably due to it running concurrently with the 3am hammer cleanup.
We can see a monotonic drop in total run time from -j1 to -j5.  After that the run time plateaus.  User and sys times increase at the same time, also plateauing beyond -j5.  This shows that increased parallelism in make will add slightly to the total overhead (sys+user), but total run time is significantly reduced.  Beyond -j ncpu+1 we can not see any improvement in run time.

A -j 2 build does not offer significant benefit over -j 1, which is not intuitive and might need some further investigation.

The -j 5 build achieves a 42% reduction in build time, respective to the -j 1 base line.

Compared to the -j 4 (i.e. -j ncpu) build, the -j 5 (i.e. -j ncpu+1) build reduces run time by an additional 5.4%.  This shows that not all CPU cores can be kept busy if there is only a parallelism level of ncpu.

Conclusion
==========
I advise to run builds at -j ncpu+1 for 4-cpu systems.  Until we have numbers for 2-cpu and UP systems, we can not provide conclusive advice, however I would try using -j3 for those two cases.

cheers
 simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: png00000.png
Type: application/octet-stream
Size: 4783 bytes
Desc: ""
URL: <http://lists.dragonflybsd.org/pipermail/users/attachments/20091020/f07c2b33/attachment-0016.obj>


More information about the Users mailing list