Benchmarking

Wed Apr 7 18:27:46 PDT 2004

Specifically on NetBSD's performance: it looks like it actually has 
better disk driver performance, but worse buffer+cache performance.

The only problem with the benchmark is that the same environment should 
be used for comparison, with one issue being the compiler used. However, 
realistically this is not possible because each of these systems use a 
different version of gcc as their default and one person cannot patch 
them all so they work with one particular version. Unfortunately, gcc 
doesn't seem to very backward compatible either or he could just have 
used the oldest version that they all support.

Thor Lancelot Simon wrote:

On Thu, Apr 08, 2004 at 12:12:05AM +0100, goteki wrote:

On Wed, 07 Apr 2004 21:52:39 +0200
Ivan Voras <ivoras at xxxxxx> wrote:

I've finished the article on benchmarking FreeBSD, NetBSD, DragonflyBSD and 
Linux, it is available at:

http://alfredo.cc.fer.hr/

Why didn't you benchmarked netbsd-current?

Presumably because it is not a released version of the operating system;
though, in that context, benchmarking "DragonflyBSD" seems rather odd,
to say the least.
What is of much more concern to me, as someone who relies on high-quality
benchmark numbers to guide his role in OS development, is the poor
methodology of this study, particularly when compared to other recent
studies such as Felix von Leitner's (http://bulk.fefe.de/scalability).  To 
me, honestly, this benchmark is not so good, for a number of reasons.
Here are three of the most obvious ones:

1) The non-repeatability of results for some tests is merely mentioned
   in passing, rather than investigated and explained.
2) Of particular concern is the omission of rows from large tables of 
   results because they were "too big" or "too small" to be interpreted 
   meaningfully.  The willingness to accept such results, to me, means 
   that I should seriously question whether any attention was given to 
   appropriately sizing _any_ of the components of the benchmark so as 
   to actually measure what is purported to be measured.

2) The inclusion of tests which are intended to measure attributes of
   *the underlying hardware* in what purports to be an OS benchmark
   is indicative of poor benchmark design and analysis.  In particular,
   synthetic benchmarks that measure "CPU speed" or "memory bandwidth"
   are wholly inappropriate in this context; the difference in results
   indicates both the poor quality of those benchmarks for their actual
   design purposes (though this is by now well-understood WRT many of
   the tests in question) and that, in general, this benchmark suite as
   a whole fails to adequately control (or even acknowledge) a number of
   variables which may cause what it _actually_ measures to not be what
   it _purports to measure_.  Notable here are compiler, system state
   at start of test and during test, and the general "entropy" which
   results from performing even good tests at too small a size (iteration
   count, memory footprint, etc).
In general, though the effort is good, I think overall this "benchmark"
shows more about how to not design an OS benchmark than it does about
the performance of _any_ of the underlying operating systems.  Do note
that, actually, NetBSD did somewhat better on this test than we initially
did on Felix's -- I'm not slagging this test because we did poorly; in
fact, I'm not entirely displeased with how we did.  The problem is that,
like so many other benchmarks, this one doesn't actually measure what it
claims to measure; and so, as an OS developer, it's not very useful to me.

So where are the 'good' OS benchmarks then? I would especially like to 
know where to find relevant disk benchmarks that can show more than 
sequential access performance (i.e. real world performance).

Thor