DragonFlyBSD clustered architecture/design

Alex Burke alexjeffburke at gmail.com
Thu Oct 2 08:26:33 PDT 2008


Hi,

I know that in reality there is likely much work to do before this  
overall clustering goal is complete, but I wondered if I might ask  
about the high level cluster architecture that is envisioned.

I recently started a course on parallel algorithms, and we were doing  
some basics on parallel architectures. This lead to some musings, and  
I wanted to see if I now understood the DFly approach/concept better.

First, I remember conversations suggesting that eventually one would  
opt to donate a certain set of resources of a local machine to a  
cluster. Will that cluster itself in essence be a large pool of shared  
resources, such as shared filesystems and shared memory?

In terms of the actual sharing of resources, I remember much  
description about the necessity of the cache coherency layer.  
Presumably this is precisely to support the sharing of files, making  
files on discrete machines available to all members of the cluster?

Would clusters have their own shared address space (effectively shared  
memory) allowing applications to transparently run in the cluster  
without really changing them - i.e. will the cluster appear as just  
another normal machine? I am guessing this is the eventual goal and  
nature of kernel level cluster support. I also guess that chunks of  
this address space will actually map to physical memory on many  
different systems?

Finally, I wanted to ask a question about some of the algorithms used  
in the kernel. I remember the approach to, for example, the network  
stack was to spawn multiple threads and dispatch work to them, thereby  
isolating data to specific CPUs allowing you to get away without  
locking. I recall there is also a message passing layer. Is the reason  
this approach was chosen because then with very little work you could  
take that code and instead of locally passing messages, one could pass  
messages over the cluster and the *very same* algorithms would still  
work? I guess it's the taking advantage of that interesting similarity  
between multiple CPUs in the same computer and distributed computing,  
where you simply have multiple CPUs but they happen to be connected by  
a network rather than a hardware bus.

Again, apologies for the slightly searching nature of the questions,  
but I am truly fascinated about these approaches. I've been pondering  
these things ever since that lecture a couple of days ago!

Thanks in advance, Alex J Burke.





More information about the Kernel mailing list