HAMMER2 progress report - 07-Aug-2012
Matthew Dillon
dillon at apollo.backplane.com
Tue Aug 7 22:14:21 PDT 2012
Hammer2 continues to progress. I've been working on the userland
spanning tree protocol.
* The socket/messaging system now connects, handshakes with public
key encryption, and negotiates AES keys for the session data stream.
* The low level transactional messaging subsystem is pretty solid now.
* The initial spanning tree protocol implementation is propagating
node information across the cluster and is handling connect/disconnect
events properly. So far I've only tested two hosts x 10 mounts,
but pretty soon now I will start using vkernels to create much
larger topologies for testing purposes.
Essentially the topology is (ascii art):
Host #1 Any cluster (graph) topology
______________________________ ______________________________
/ \/ \
PFS mount ------\
PFS mount ------\\ /---(TCP)-- Host #2 --(TCP)----\
PFS mount ------ hammer2 service -----------(TCP)------------- Host #3
PFS mount ------// \---(TCP)-- Host #4 --(TCP)----/
PFS mount ------/
Full graph spanning tree protocol so there can be loops, multiple ways
to get to the same target, and so on and so forth. The SPANs propagate
the best N (one or two) paths for each mount.
Any given mount is just a HAMMER2 PFS, so there will be immense
flexibility in what a 'mount' means. i.e. is it a master node? Is
it a slave? Is it a cache-only node? Maybe its a diskless client-only
node (no persistent storage at all), etc.
Because each node is a PFS, and PFS's can be trivially created
(any single physical HAMMER2 filesystem can contain any number
of PFS's)... because of that there will be immense flexibility in
how people construct their clusters.
* The low level messaging subsystem is solid. Message relaying is next
on my TODO list (using the spanning tree to relay messages). After
that I'll have to get automatic-reconnection working properly.
Once the low level messaging subsystem is solid I will be able to start
working on the higher-level protocols, which is the fun part. There is
still a very long ways to go.
Ultimately the feature set is going to be huge, which is one reason why
there is so much work left to do. For example, we want to be able to
have millions of diskless or cache-only clients be able to connect into
a cluster and have it actually work... which means that the topology
would have to support 'satellite' hosts to aggregate the clients and
implement a proxy protocol to the core of the topology without having
to propagate millions of spanning tree nodes. Ultimately the topology
has to allow for proxy operation, otherwise the spanning tree overhead
becomes uncontrolled. This would also make it possible to have
internet-facing hosts without compromising the cluster's core.
Also note that dealing with multiple physical disks and failures will
also be part of the equation. The cluster mechanic described above is
an abstraction for having multiple copies of the same filesystem in
different places, with varying amounts of data and thus gaining
redundancy.
But we ALSO want to be able to have a SINGLE copy of the filesystem
(homed at a particular machine) to use the SAME mechanism to glue
together all of its physical storage into a single entity (plus with
a copies mechanic for redundancy), and then allow that filesystem to
take part in the multi-master cluster as one of the masters.
All of these vastly different feature sets will use the same underlying
transactional messaging protocol.
x bazillion more features and that's my goal.
-Matt
More information about the Users
mailing list