rc and smf

Bill Hacker wbh at conducive.org
Thu Feb 24 12:13:44 PST 2005


Joerg Sonnenberger wrote:

On Thu, Feb 24, 2005 at 11:39:36AM -0800, Matthew Dillon wrote:

   But anyhow, back to service failures... service failures do not always
   end in a crash.  Take BIND for example.  It is far more likely that
   BIND's cache will become corrupted then for BIND to actually crash.  A
   simple 'detect that it died and restart' monitor doesn't help you there.
   What you have to do is have a program which actually goes in and uses
   the service for real.  e.g. for a web server a program which connects
   to it every minute and retrieves the most complex CGI'd page it
   serves out.  That's the sort of monitoring we need... not this simple
   it-dies-and-we-restart stuff.  Service corruption is the far more likely
   scenario these days.


I completely agree. IBM has a nice, extensible monitoring facility for AIX,
basically a combination of sensors and trigger rules. The concept alone
is pretty simple, but that does provide mighty tools.
I'd love to have such a daemon written in a modular way for DragonFly/BSD.
It would be something like SNMP with intelligence.
Joerg
'checkservice'  - in the ports tree for some years, lets us keep an eye 
on our server's 'public facing' daemons from other servers (or locally, 
but 'Quis Custodiet' etc.).

Not perfect, but extensible by 'plugins'.  Can do realistic tests, not 
just check that the port is active.

Maybe a start for something more general-pupose?

Bill






More information about the Kernel mailing list