checkpoint/restart
Matthew Dillon
dillon at apollo.backplane.com
Thu Oct 9 01:45:57 PDT 2003
:> sigacts has the ps_sigact[] array which I believe contains
:> 128 signals by default. Since only 32 some odd signals are defined,
:
:I clearly didn't look closely enough at it.
:
:> there are plenty of signals available for your checkpoint and restore
:> vectors (though I still think there is only a need for a checkpoint
:> signal, since the restore can simply resume the signal handler from
:
:If there is extra space that would clearly be cleaner than having my own
:upcall function. I can then use the underlying kernel machinery to do the
:work. That will also remove the need to make any underlying changes to the
:kernel. This may also answer my question about the upcall. I may just be
:able to queue up a checkpoint signal.
I think the best approach is a suspend/resume model if the program
is not checkpoint-aware, or a signal-to-suspend and direct resume
if the program is checkpoint aware. If the program is checkpoint aware
it will have been suspended in the (user) checkpoint signal function
itself, so no special upcall would be needed to resume. i.e.
checkptsignal()
{
do suspend stuff
freeze(); (system call)
(resume point is right here)
do resume stuff
}
If the program is not checkpoint aware the check point signal would be
handled in the kernel and suspend the program directly, again with no
upcall on resume.
This is predicated on the full memory image being checkpointed, with
file descriptor references for shared regions.
:>
:> We need to add a stty signal similar to how ^\ core's a program, to
:> checkpoint a program. That would be so cool.
:
:That would be very easy.
I'm seeing stars. Cool, easy, and a *very* powerful mechanism.
-Matt
More information about the Kernel
mailing list