SLAB allocator now the default.

Sun Sep 28 15:53:12 PDT 2003

:>    to allocate whole pages.  The slab allocator does this for power-of-2
:>    sized requests beyond PAGE_SIZE but does NOT page-align oddly sized
:>    requests (like a 6K request) beyond PAGE_SIZE, at least until the requests
:>    get large (greater then 16K).
:> 
:>    So keeping the power-of-2-allocation-is-power-of-2-aligned characteristic
:>    is reasonable for power-of-2-sized requests.
:
:structures smaller than say 128 bytes should be rounded up to the next larger
:2^n size though.
:
:-- 
:	Sander

    Well, I don't think you can point to any one thing and say that it
    will magically solve all the problems.  It takes an integrated approach
    to make things operate smoothly.

    For example, there is a rather severe memory and cache efficiency
    tradeoff here that cannot be ignored.  If one is allocating 32 byte
    structures and wasting 128 bytes of memory on each one the result is
    that 80% of your memory accesses wind up using only 20% of your available
    L2 cache, which makes your cache only 1/3 as effective as it would be
    if you had compacted the allocations to spread them over the entire L2
    cache evenly.

    In DragonFly we do several things, and taken together they form a far
    more effective solution:

    (1) Our slab allocator is per-cpu.

    (2) Because it is per-cpu our slab allocator can make compact
	allocations without severe cache line contention.

    (3) We forward modifications to structures to the cpu owning the
	structure (the structure that was also allocated on that cpu,
	typically), to reduce modifying cache contention and avoid the use
	of mutexes (mutexes virtually guarentee cache contention).

    (4) We intend to isolate subsystems in their own cpu-locked threads
	so the related data structures remain local to the cpu.

    We don't do everything perfectly... right now the cpu allocating a
    structure is not necessarily the cpu that is going to use it, for example,
    but it is simply not possible to cover all the bases right from the
    start.  As long as the infrastructure and programming model allow for
    it to be done properly, as a goal, then we can eventually achieve the
    goal.

    So, at least in regard to DragonFly, aligning memory requests on 128
    byte boundaries would be detrimental.

					-Matt
					Matthew Dillon 
					<dillon at xxxxxxxxxxxxx>