4mb PAGES for mbuf clusters

km b kmb810 at gmail.com
Sun Jul 15 16:37:40 PDT 2007

On 7/16/07, Matthew Dillon <dillon at apollo.backplane.com> wrote:
     This only works if the memory allocations are temporary.  Unfortunately,
     many memory allocations are persistent.  Without a way to reallocate
     the persistent store there is no way to undo the fragmentation.
True. Though, most mbuf allocations are not persistent but short lived.

     Reallocation is often integrated into high level interpreted
     languages through the means of a double indirect.  That is,
     instead of holding a pointer to something you hold a pointer to a
     pointer to something.  That way the 'something' can be reallocated
     by just changing the underlying pointer and not all the active
     references.  But doing this imposes fairly severe overheads unsuitable
     for kernel code as well as locking issues.
     In absence of that you either have to pre-reserve the memory for
     use ONLY as 4MB pages, which is easy to do but almost impossible to
     balance, or there needs to be a mechanism whereby individual subsystems
     can reallocate memory for active structures - something very difficult
     to do properly.
In FreeBSD UMA zone allocator there is a way to change the underlying
backing store for particulare zone and switch to a custom backing
store. I don't know whether objcache has similar implementation or
There are two points here:

1. x86 specific optimization for specific! work loads.
2. With port to amd64 on the way and availability of machines with
very large memory, it would be nice to design the vm subsystem so that
kernel has an option to manage memory in chunks of 4K pages or 2MB
pages based on user configurable option. Again my assumption is that
switching to 2MB pages is not just adjusting the PAGE_SIZE from 4K to
2MB as most of the vm subsystem is optimized for smaller page size.
Something is wrong up on cloud # 9!

More information about the Kernel mailing list