You could do worse than Mach ports
Matthew Dillon
dillon at apollo.backplane.com
Fri Jul 18 11:12:43 PDT 2003
: I am REALLY intersted in the user->user case... In fact I've had some
: ideas that have to do with Mach-like port permissions and exposed
:regions
: of memory based on the MPI-2 One Sided [RDMA] semantics. However I
:haven't
: gotten very far with this idea on paper to even decide if its
:doable....
: I want to expose a region of memory for outside access to a known
:group of
: processes or threads for a certain amount of time at which point
:that memory
: could be thought of as "don't touch" for the duration of the
:epoch... All
: accesses to that memory could be considered "remote" during the
:epoch using
: only "put" and "get" requests which I would be relying on the VM
:layer to do
: the writes and reads to the memory in the exposed buffer... even
:for the local
: process.
:
: Does this sound feasible? Like I said, I haven't gotten very far....
:but this
: API is present, more or less, on high-speed Infiniband hardware
:drivers as well
: as Myrinet GM where there is hardware to do the DMA accesses needed to
:prevent
: interrupting the CPUs of remote nodes so they can continue crunching
:data while
: messages flow through a cluster. Its quite beautiful and elegant in
:that context.
:
: In user<->user messaging it would just be a natural extension, I
:think, of this
: idea. However I have not counted on context switches and other things
:that may
: need to occur in a BSD/Unix like kernel that may make this design
:horrible.
Well, anytime you have to play with VM mappings you incur a horrible
cost verses not having to and just making a direct call. I guess
it depends on how much data the user process winds up manipulating
outside of the don't-touch periods.
In an implementation of the above scheme it might be easier simply to
make the memory don't-touch all the time, rather then just during
the epoch, and rely on "put" and "get" to do the right thing.
If the user process is manipulating a *LOT* of data then a double-buffered
approach might be the way to go, where the user process always has access
to a kernel-providing writable buffer and when it stages it into the
kernel the kernel replaces the user VM page with a new buffer (making
the old one now in the sole domain of the kernel).
I'm not sure whether these schemes would apply to DragonFly. There are
DMA issues in all UNIXes which I believe only NetBSD has solved with
UVM so far. In FreeBSD vfs_busy_pages() is responsible for preventing
user writes to pages undergoing write I/O. In DragonFly we will
eventually replace that with a COW scheme like UVM has.
:
: queueing occurs only in the synchronous case then? I need to see that
:AmigaOS :).
Queueing only occurs in the asynchronous case. In the synchronous case
the port agent completes processing of the message and returns a
synchronous error code (which is anything other then EASYNC).
Of course, the port agent is free to do whatever it wants... it could
very well use queueing internally and spin on the result, then return
a synchronous result. It is 'opaque' though of course the intent is
for it to queue and return EASYNC instead of blocking in that case.
I can see cases where a port agent might occassionally block.. under
exceptional circumstances that are outside the critical path, such as
in critical low memory situations.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Kernel
mailing list