cvs commit: src/lib/libc/sys syslink.2
Matthew Dillon
dillon at apollo.backplane.com
Tue Apr 3 10:46:46 PDT 2007
:Hey, that's exacly my area of interest! So maybe I can contribute a bit =
:to the design...
:
:What do you mean with "optimal physical route"? There are multiple metri=
:cs I could imagine:
:
:1. shortest hop
:2. maximize minimum bandwidth
:3. lowest latency
:4. node/link availability of the routing nodes
All of the above. What I want to do is have a multi-layered approach
to finding a good route between two nodes. I want a solid algorithm
to serve as a fail-safe (such as forging a path by making a broadcast
to the entire mesh)... it doesn't matter how inefficient it is, it
just has to work regardless of the state of the network. Then I want
to add additional layers which provide for more optimal path discovery.
For example, lets say we limit the mesh to 50 broadcasts a second
across any given link. We could implement a heuristic that eats up,
say, 10 broadcasts a second to implement an algorithm that is
constantly trying to find better paths for existing connections.
What is important here is that the mesh be able to 'settle down' into
using more optimal paths within a reasonable period of time after a
major change to the topology.
:for efficiently designing the routing overlay, we also should have an est=
:imate of the maximum number of nodes joining one cluster. are we talking=
: 100, 1000 or 1000000?
The limitation is the number of hops and the complexity (number
of switch elements) per hop. So, for example, if you want to support
256 machines on an ethernet switch you would need 8 bits to represent
the number of elements on that etherswitch.
A 'physical' address is always relative. Each individual hop can use
a different number of bits, but for arguments sake lets say that each
hop needed 4 bits. A physical address like this:
0xA1451CE3114762AE
Would thus be broken down into blocks of 4 bits. The originator
sends the packet to its route node (everyone has to connect to a
route node somewhere, where the route node is represented by the
syslink descriptor returned from the syslink system call). This
route node needs to know how to direct the packet. It would pull
off the low 4 bits, i.e. the 'E', and use that as an index to forward
the packet to one of the 16 entities connecting to it in this example,
and it would rewrite the target address as 0x0A1451CE3114762A. That
is, it would 'eat' the bits.
The next entity would pull off the bits it needs, say another 4 bits,
then forward the packet and rewrite the address again.
This would continue until there are no bits left... well, in reality,
until the address becomes '0' (since we are shifting 0 bits into the
MSB).
For any given link, all 0's always addresses the route node
itself, and all 1's represents a broadcast.
It is important to note that the 'physical' address is a relative
route, not an absolute address. The physical address for node X
will be different for a packet sent from node A verses a packet
sent from node B. This is the basic core needed to create a
self-identifying network.
Forging a route will be the function of the routing node (the syslink
entity), and NOT usually a leaf on the network. If you have 14
entities connected to a syslink node, that node can forge a route
to a target that can then be cached and supplied to all 14 of those
entities. We wouldn't have to forge 14 separate routes... that is,
the trivial collapse cases would be collapsed.
Also, it needs to be noted that the resources that are being routed
are fairly coarse. i.e. we might route to a filesystem, not to an
individual file, which means that if there are 5000 processes talking
to that filesystem, there is still only one route (well, maybe a
few in order to support multi-pathing or multi-homed targets as a
means of improving the bandwidth to said targets)... but not 5000.
:there are a number of possible self-structuring methods (preferably incre=
:mental, dynamic) to produce an effective mesh. also, should we optimize =
:on writer vs readers, or are all nodes considered writers and readers? o=
:ooh, that sounds exciting :)
:
:cheers
: simon
All nodes are considered to be full duplex. Any given transaction of
course has an originator and a target, but the major data flow can be in
both directions (e.g. a read request verses a write request).
The return direction need not use the same route that the sending
direction used. In fact, we would probably want the two directions
to be separately routed in order to be able to make use of known
or heuristically detected bandwidth delay limitations. If someone
connects to two parts of the mesh via a cable modem at home we do not
want to try to route 50 MBits/sec of traffic over that link to get
from one side of the mesh to the other! heh.
Even though the return direction does not need to use the same route
in reverse, the message structure will construct a reverse route as
the message is being forwarded (the syslink_msg structure has two
physical fields... one for the source address, one for the target.
The target field is deconstructed as the message is routed while the
source field is constructed as the message is routed).
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the Commits
mailing list