rc and smf

Bill Hacker wbh at conducive.org
Thu Feb 24 11:03:20 PST 2005


Dan Melomedman wrote:

Joerg Sonnenberger wrote:

Actually, this is exactly one of the situations where I don't want
automatic, silent restarts. It hides problems, which is in my position
even more problematic. "Magic restart" doesn't solve every problem.
Joerg


Nothing solves every problem. Supervision solves the 'Oops, something
crashed, and needs to be restarted' problem. If my nearby nuclear power
plant's reactor monitoring software running on a Unix box gets killed
due to a memory leak, I want it restarted immediately, not wait for the
administrator to find out by the time the reactor melts down. 
No you do not.

What you DO want, when *any* fault occurs of that nature, is for a 
totally separate system - usually a 'state machine' - or even *gravity* 
to take over and 'safe' that plant until the real cause is scrutinized 
by a team of experts.  Too much is at stake to blindly restart a daemon 
OR the OS.

Unix has no more business running nuke power plants than Windows.  That 
is specialized RT OS ground.  Or state machines monitored by specialized 
computers. Or both.

All fault
tolerant systems have some kind of supervision in software.
All seriously critical ones have hardware / firmware fall-backs and 
manual overrides as well.

All failures be they oil-refinery, chemical plant, power plant or web 
and mail servers *should* be brought to human attention, examined and 
attended to by folks with brains.  That way we can fix them, not be 
victims of them.

Bill





More information about the Kernel mailing list