Reviving userland LWKT
dillon at apollo.backplane.com
Wed Jul 12 13:53:52 PDT 2006
:I'm interested in bringing back to life the port of LWKT to userland
:(the old "libcaps") - I know a lot of this stuff hasn't been a high
:priority for quite some time, but it was a cool unique Dragonfly feature
:and I'd hate to have it go to bit-rot as the rest of the system
:continues to mature.
:So, anyone have any ideas, suggestions or requests going out? I realize
:the sysport/sendsys stuff has been removed and I am prepared to deal
:with that (I'm not sure it ever worked anyway, did it?) My plan of
:attack focuses on usermode<->usermode intraprocess and interprocess
:messaging first, leaving the whole issue of what to do with the kernel
:interfaces for last.
Instead of doing that, perhaps you would be interested in something
even MORE interesting, but very similar -- userland VFS!
We need a stream based command/response mechanism that basically
works similarly to the current filesystem journaling protocol but
allows bi-directional command initiation, parallel commands, etc etc.
Initially I just want the protocol to operate over a pipe or socket,
and not via shared memory, but the protocol also has to eventually
be extensible to a shared memory FIFO implementation which creates
We need this protocol for both userland VFS and for data links
between machines in a clustered environment. The idea is fairly
simple in concept, but there are a lot of fine details involved, in
particular in dealing with link breakages and reconnects.
Each command would be constructed as an encapsulated message using
a recursive structure, like this:
linkid (64 bits) (specifies the communications end point)
msgid (32 bits) (allows parallel commands to be issued)
command (16 bits) (bit 15 indicates a response)
length (16 bits)
itemid (16 bits) (bit 15 indicates item recursion)
(bit 14 indicates ref'd data)
itemlen (16 bits)
data (recursive item if item recursion)
Messages would be limited to 65535 bytes (so, basically, a message
embedding a data buffer would be limited to a 32 KB data buffer).
Messages representing large I/O's (such as someone doing a 2GB read()
through to a userland VFS) would transfer the potentially large amount
of data using a separate linkid that represents a 'data cache object'.
To support a future shared memory FIFO API messages will require specific
alignments so as not to create cases that either overflow a shared memory
FIFO or cross the boundary from the end of the FIFO's buffer back to
the beginning. This part is fairly simple. The 'length' field simply
represents the logical length of the message (e.g. like 27 bytes), but
the actual formatting of the message within the stream would always be
8-byte aligned (e.g. nextoffset = currentoffset + (msg->length + 7) & ~7).
Same with the recursive items.
This would be a layered protocol, with the lowest layer simply
encapsulating the core messaging protocol. A layer on top of that
would maintain and/or route end-points via the linkid. A layer on top
of that would implement disconnection/reconnection/connection-failure
handling. Then command/response protocols would be layered on top
of that to implement infrastructure elements such as 'cache' objects
(an obect that represents a chunk of data), VFS commands, and so on
and so forth.
In order to be able to pass new linkid's as data the core protocol
would have to reserve a bit in the itemid to indicate that the item
or item recursion contains linkid's (in order to maintain reference
counts on active connections and cache objects and such).
Ok, that may be somewhat confusing, but hopefully you can see what I
am getting at. This is a big ticket need for DragonFly... we need
it for a userland VFS implementation, and we need it for inter-machine
links in a cluster.
<dillon at xxxxxxxxxxxxx>
More information about the Kernel