rc and smf

Chris Pressey cpressey at catseye.mine.nu
Thu Feb 24 12:20:34 PST 2005


On Thu, 24 Feb 2005 14:12:46 -0500
Dan Melomedman <dan at xxxxxxxxxxxxxxxx> wrote:

> You don't see the point. It takes a long time to fix the fault. BSD
> has nothing to do with this. The real world does. You don't want a
> nuclear reactor to explode because it took an admin five minutes to
> notice the fault, and restart the service.
> 
> Another example: a telecom can't afford to lose service in some of the
> systems even for mere seconds. They lose thousands of dollars. This is
> exactly why Erlang, the language originally designed with telecom
> requirements in mind has supervision in its feature set! When you make
> a call in the UK, it runs through an Ericsson switch running Erlang
> that supervises its processes, and restarts them if they fail. Again,
> supervision may be new to some people on this list, but it isn't
> anything new or detrimental.

There's some distinctions to be made here, though:

- A strictly fault-tolerant system either needs to be provably reliable
(in a mathematical sense), or it needs a supervisor (which itself must
be provably reliable.)

- Not everyone needs a fault-tolerant system.  Or rather, different
people need different degrees of fault-tolerance.  Most people don't
need telecom-level reliability.

- Many daemons implement some form of supervision themselves.  Much of
the 'djb regime' is not actually new, it just tries to commodify
concepts such as supervision and daemonization at the operating system
level, rather than having every program do it themselves.

- Erlang's concurrency is typically much more fine-grained; most Erlang
processes are not daemons in the usual sense (they only ever service
each other rather than the outside world.)  The programming paradigm in
this case is also different; because supervision guarantees have already
been made, failure is "acceptable", and many processes are written in a
"let it crash" style.  This simplifies error handling immensely in many
cases, BUT it's most practical when working with lightweight processes
(basically threads).  It's not nearly as effective a programming style
when working with operating system processes.

-Chris





More information about the Kernel mailing list