Trigger checkpointing from within the application
Matthew Dillon
dillon at apollo.backplane.com
Mon Nov 22 20:41:48 PST 2004
The current checkpoint implementation has two signals, SIGCKPT and
SIGCKPTEXIT (signals 33 and 34). SIGCKPT means 'checkpoint and continue'
and that is the one that the TTY will generate when you hit ^E.
A process that is checkpoint-aware should be able to set a signal
handler for SIGCKPT. This will prevent the automatic checkpoint from
occuring so your program can then control when the checkpoint is to
occur.
Unfortunately that's where the work ended. Theoretically one can make
the checkpoint system call but we have not incorporated the system call
into the master syscall list yet (it only exists as a module and the
checkpt program is kinda hacked to generate the syscall to restore).
It seems that there is more then a passing interest in the checkpointing
code so I will finish up the interface and make the system call available.
At that point you will be able to set a signal handler to catch the
request and then call the checkpointing code when convenient. The
resume would not be another signal, it would simply resume with a
different return code from the system call (so you can tell the difference
between the initial call that dumps the checkpoint and the
resume-from-checkpoint by looking at the return code of the system call).
In anycase, that sounds like a bit of fun and now that I've fixed
chmod/chown I need to have a bit of fun, so I'll get it done tonight.
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
:Michael Neumann wrote:
:
:> Hi,
:>
:> I'd like to checkpoint my application without that the user has to type
:> ^E. It's a server application. What I'm dreaming of is a syscall like this:
:>
:> if (checkpt('filename.ckpt')) {
:> // checkpoint was restored
:> }
:>
:> checkpt() will checkpoint the running application as ^E do. The checkpt
:> syscall returns 0 for the running application and != 0 when the
:> checkpointed application is restored.
:>
:> Or instead of a syscall, how about a special signal, that is called
:> after the checkpoint was restored?
:>
:> I have to run some procedures to setup sockets etc. after a checkpoint
:> has been restored.
:>
:> Regards,
:>
:> Michael
More information about the Users
mailing list