new lwbuf api and mpsafe sf_bufs

Wed Mar 11 11:15:58 PDT 2009

On Tue, Mar 10, 2009 at 9:35 PM, Matthew Dillon
<dillon at apollo.backplane.com> wrote:
> :If implementing caching at this level would it make sense to still
> :grab a page of kva at a time and cache forever, looking through our
> :hash/tree in an attempt to reuse, then looking at the end of an
> :overlapping LRU queue to see if we have one with a refcnt of 0, and
> :acquiring more resources if all else fails up to our maximum limit?
> :With some modest pre-allocation to avoid any initial sloshing.
> :
> :With the duration that these allocations are held I am not convinced
> :that caching buys us anything unless we cache past the free.
> :
> :Sam
>
>    Ok, I'll just reiterate some of what we talked about on IRC so
>    people reading the thread don't get confused by our skipping around.
>
>    Basically the original SFBUF system caches freed mappings which may
>    be reused later.  The lwbuf API loses that ability.  What I do like
>    about the lwbuf code is that it dynamically allocates the lwbufs.
>
>    The sfbufs are statically allocated and thus have scaling issues
>    (e.g. the example Samuel gave on IRC is when one running many parallel
>    sendfile()'s).   I would like to see the SFBUF code use a more
>    dynamic model and a sysctl which sets the maximum number of *excess*
>    sfbufs instead of the maximum number of sfbufs.
>
>    The other part of this equation is how to optimize for MP operation.
>    Initially we should just use the same global index model that we
>    use now, though perhaps using the newly available atomic ops to
>    control the ref count and cpumask.  As a second step we can figure
>    out a better model.
>
>                                        -Matt
>                                        Matthew Dillon
>                                        <dillon at backplane.com>
>

I had another thought on this (Simon-inspired), let me see if I can illiterate.

struct lwbuf_pcpu {
    struct vm_page    *m;
    vm_offset_t           kva;
    int                        refcnt;
};

lwbuf_pcpu structures sit in a pcpu hash/rbtree

struct lwbuf {
    struct vm_page    *m;
    cpumask_t           cpumask;
};

lwbuf's kept in objcache, returned to alloc caller

lwbuf_alloc:
    create/reuse a lwbuf_pcpu setting refcnt to 1, or bump existing
lwbuf_pcpu refcnt, set lwbuf cpumask to current cpu.

lwbuf_kva:
    check lwbuf cpumask and if it is on our cpu we know there is a
lwbuf_pcpu there and held, look it up and return the kva.  otherwise
re-use lwbuf_alloc code to create a held lwbuf on the current cpu and
return the kva.

lwbuf_free:
    decref on current cpu if it is in our cpumask, if other cpu's
exist in the mask propagate the decref via passive ipi.

kva could be shared and mapped differently on each cpu, debugging
could get interesting in this case.

All other consumers beyond sendfile proper can use this api.

sf_buf continues to exist as now serving only sendfile proper, made
dynamic and spinlock protected as Dillon explained.

Thoughts?

Sam