checkpoint/restart

Matthew Dillon dillon at apollo.backplane.com
Thu Oct 9 01:45:57 PDT 2003


:>     sigacts has the ps_sigact[] array which I believe contains
:>     128 signals by default.  Since only 32 some odd signals are defined,
:
:I clearly didn't look closely enough at it.
:
:>     there are plenty of signals available for your checkpoint and restore
:>     vectors (though I still think there is only a need for a checkpoint
:>     signal, since the restore can simply resume the signal handler from
:
:If there is extra space that would clearly be cleaner than having my own
:upcall function. I can then use the underlying kernel machinery to do the
:work. That will also remove the need to make any underlying changes to the
:kernel. This may also answer my question about the upcall. I may just be
:able to queue up a checkpoint signal.

    I think the best approach is a suspend/resume model if the program
    is not checkpoint-aware, or a signal-to-suspend and direct resume
    if the program is checkpoint aware.  If the program is checkpoint aware
    it will have been suspended in the (user) checkpoint signal function
    itself, so no special upcall would be needed to resume.  i.e.

    checkptsignal()
    {
	do suspend stuff
	freeze(); (system call)
	(resume point is right here)
	do resume stuff
    }

    If the program is not checkpoint aware the check point signal would be
    handled in the kernel and suspend the program directly, again with no
    upcall on resume.

    This is predicated on the full memory image being checkpointed, with
    file descriptor references for shared regions.

:>
:>     We need to add a stty signal similar to how ^\ core's a program, to
:>     checkpoint a program.  That would be so cool.
:
:That would be very easy.

    I'm seeing stars.  Cool, easy, and a *very* powerful mechanism.

						-Matt





More information about the Kernel mailing list