DragonFlyBSD Thread on osnews

Jonathan Weeks jbweeks
Fri Feb 2 16:35:58 PST 2007


FYI -- there was a DragonFlyBSD 1.8 announcement on osnews, with a 
thread discussing Linux scalability vs DragonFlyBSD, which might bear 
an educated response:

http://www.osnews.com/comment.php?news_id=17114&offset=30&rows=34&threshold=-1

I admit I'm not the most experienced kernel programmer in the world, 
but I have a few years of Linux and AIX kernel programming experience. 
Maybe you are more qualified, I don't know.

You say Linux scales up to 2048 CPUs, but on what kind of system?

The top end of the SGI Altix line of Linux supercomputers runs 4096 
CPUs, and IBM validated Linux on a 2048-CPU System P. Linux scales to 
1024 CPUs without any serious lock contention. At 2048 it shows some 
contention for root and /usr inode locks, but no serious performance 
impact. Directory traversal will be the first to suffer as we move 
toward 4096 CPUs and higher, so that's where the current work is 
focused.

Is this the same kernel I get on RHEL. Can I use this same kernel on a 
4 CPU systemm? What Linux version allows you to mix any amount of 
computers with whatever amount of cpus and treats them all as one 
logical computer while being able to scale linearly?

Choose the latest SMP kernel image from Red Hat. The feature that 
allows this massive scaling is called scheduler domains, introduced by 
Nick Piggin at version 2.6.7 (in 2004). There is no special kernel 
config flag or recompilation required to activate this feature, but 
there are some tunables you need to set (via a userspace interface) to 
reflect the topology of your supercomputer (i.e. grouping CPUs in a 
tree of domains).

Usually massive supercomputers are installed, configured, and tuned by 
the vendor. They'd probably compile a custom kernel instead of using 
the default RHEL image. But it could work out of the box if you really 
wanted it to.

. ..rather than rely on locking, spinning, threading processes to 
infinity, it will assign processes to cpus and then allow the processes 
to communicate to each other through messages.

That's fine. It's just that nobody has proven that message passing is 
more efficient than fine-grained locking. It's my understanding 
(correct me if I'm wrong) that DF requires that, in order to modify the 
hardware page table, a process must to send a message to all other CPUs 
and block waiting for responses from all of them. In addition, an 
interrupted process is guaranteed to resume on the same processor after 
return from interrupt even if the interrupt modified the local runqueue.

The result is that minor page faults (page is resident in memory but 
not in the hardware page table) become blocking operations. Plus, you 
have interrupts returning to threads that have become blocked by the 
interrupt (and must immediately yield), and the latency for waking up 
the highest priority thread on a CPU can be as high as one whole 
timeslice.

DF has serialization resources, but they are called tokens instead of 
locks. I'm not quite sure what the difference is. There also seems to 
be a highly-touted locking system that allows multiple writers to write 
to different parts of a file, which is interesting because Linux, 
FreeBSD, and even SVR4 have extent-based filocks that do the same 
thing. What's different about this method?

I hope I've addressed your questions adequately. Locks are evil, I 
know, but they seem to be doing quite well at the moment. Maybe by the 
time DF is ready for production use there will be machines that push 
other UNIX implementations beyond their capabilities. But for now, 
Linux is a free kernel for over a dozen architectures that scales 
better than some proprietary UNIX kernels do on their target 
architecture. That says a lot about the success of its design






More information about the Users mailing list