(userspace) vfs and xio [CAPS?]

Dave Leimbach leimySPAM2k at mac.com
Tue May 25 12:20:13 PDT 2004

Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx> writes:

> :This brings me back to wondering about CAPS and it's 128K single message limits.
> :
> :Is it expected that the user of CAPS would manually break his/her messages into
> :<= 128K chunks and then send?  
> :
> :
> :
> :> 
> :>     But because cross-address-space access are so expensive, it will
> :>     probably be more efficient to break a large UIO into an XIO loop
> :>     and take the thread switching hit in the loop rather then to try to pull
> :>     out a portion of a foreign address space.  Besides, physical I/O
> :>     is already limited to 128KB/256KB chunks so the extra thread switches
> :>     will not be that big an issue.
> :> 
> :
> :Well yeah... like that for CAPS too?  Or will CAPS include this loop in it's
> :implementation?
> :
> :Seems like a concern that CAPS users shouldn't have to deal with unless it's
> :terrbly inefficient to implement the loop in CAPS.
> :
> :Dave
>     I agree completely.  That limit is temporary... it was the easiest way
>     to rip out the old cross-address-space junk and use XIO instead.  The
>     code needs another pass to add the transfer loop (which is really just
>     another one or two states for the message).

In Portals [an RDMA style "put/get" message passing system] you get "start"
and "finish" events on an event queue object when a put or get starts or
finishes.  I think that would work nicely here as well.

We don't really have a "queue" handle per-se as portals does but we do have
the CAPS id which could be viewed that way I suppose.  

I wonder if it would be possible to get cid's into kqueue notification?

I believe the Portals API even specifies some non-contiguous message
passing via some iovec like implementation in it's latest specification.

For interested parties.

I've used this API extensively in the last 3 years and it's not too shabby :).

It's being used as the underlying layer of the Lustre cluster file system as
well and is implemented in kernel space, IIRC, in linux on top of a Network
Abstraction Layer.

Pretty neat stuff.  It was designed with scalability in mind as  you don't 
require a socket to address other processes... you just have to find it's 
pid and nid [process id and node id] to talk to it.  In other words you don't
need file descriptors for sockets to do communication.

I wouldn't mind taking a few passes at the CAPS stuff perhaps sometime this 
week and see what I come up with.  I am relocating all next week but will
hopefully have my stuff back from the moving company within 8 days and would
be able to resume.


More information about the Kernel mailing list