HAMMER2 progress report - 07-Aug-2012

Matthew Dillon dillon at apollo.backplane.com
Tue Aug 7 22:14:21 PDT 2012


    Hammer2 continues to progress.  I've been working on the userland
    spanning tree protocol.

    * The socket/messaging system now connects, handshakes with public
      key encryption, and negotiates AES keys for the session data stream.

    * The low level transactional messaging subsystem is pretty solid now.

    * The initial spanning tree protocol implementation is propagating
      node information across the cluster and is handling connect/disconnect
      events properly.  So far I've only tested two hosts x 10 mounts,
      but pretty soon now I will start using vkernels to create much
      larger topologies for testing purposes.

      Essentially the topology is (ascii art):

                   Host #1                Any cluster (graph) topology
       ______________________________  ______________________________
      /                              \/                              \
      PFS mount ------\
      PFS mount ------\\             /---(TCP)-- Host #2 --(TCP)----\
      PFS mount ------ hammer2 service -----------(TCP)------------- Host #3
      PFS mount ------//             \---(TCP)-- Host #4 --(TCP)----/
      PFS mount ------/

      Full graph spanning tree protocol so there can be loops, multiple ways
      to get to the same target, and so on and so forth.  The SPANs propagate
      the best N (one or two) paths for each mount.

      Any given mount is just a HAMMER2 PFS, so there will be immense
      flexibility in what a 'mount' means.  i.e. is it a master node?  Is
      it a slave?  Is it a cache-only node?  Maybe its a diskless client-only
      node (no persistent storage at all), etc.

      Because each node is a PFS, and PFS's can be trivially created
      (any single physical HAMMER2 filesystem can contain any number
      of PFS's)... because of that there will be immense flexibility in
      how people construct their clusters.

    * The low level messaging subsystem is solid.  Message relaying is next
      on my TODO list (using the spanning tree to relay messages).  After
      that I'll have to get automatic-reconnection working properly.

    Once the low level messaging subsystem is solid I will be able to start
    working on the higher-level protocols, which is the fun part.  There is
    still a very long ways to go.

    Ultimately the feature set is going to be huge, which is one reason why
    there is so much work left to do.  For example, we want to be able to
    have millions of diskless or cache-only clients be able to connect into
    a cluster and have it actually work... which means that the topology
    would have to support 'satellite' hosts to aggregate the clients and
    implement a proxy protocol to the core of the topology without having
    to propagate millions of spanning tree nodes.  Ultimately the topology
    has to allow for proxy operation, otherwise the spanning tree overhead
    becomes uncontrolled.  This would also make it possible to have
    internet-facing hosts without compromising the cluster's core.

    Also note that dealing with multiple physical disks and failures will
    also be part of the equation.  The cluster mechanic described above is
    an abstraction for having multiple copies of the same filesystem in
    different places, with varying amounts of data and thus gaining
    redundancy.

    But we ALSO want to be able to have a SINGLE copy of the filesystem
    (homed at a particular machine) to use the SAME mechanism to glue
    together all of its physical storage into a single entity (plus with
    a copies mechanic for redundancy), and then allow that filesystem to
    take part in the multi-master cluster as one of the masters.

    All of these vastly different feature sets will use the same underlying
    transactional messaging protocol.

    x bazillion more features and that's my goal.

						-Matt





More information about the Users mailing list