checkpoint/restart

Fri Oct 10 00:01:04 PDT 2003

>
>     I think the best approach is a suspend/resume model if the program
>     is not checkpoint-aware, or a signal-to-suspend and direct resume
>     if the program is checkpoint aware.  If the program is checkpoint aware
>     it will have been suspended in the (user) checkpoint signal function
>     itself, so no special upcall would be needed to resume.  i.e.
>
>     checkptsignal()
>     {
> 	do suspend stuff
> 	freeze(); (system call)
> 	(resume point is right here)
> 	do resume stuff
>     }

I've removed all the upcall code and changed the values returned by
ckpt_checkpoint/ckpt_restore to correspond to setjmp/longjmp. The test
code now uses the above suggested semantics. Instead of directly calling
ckpt_checkpoint the code registers handle_ckpt for SIGCKPT (arbitrarily
defined as 42) and then sends itself SIGCKPT. The function handle_ckpt is
defined as follows:

static void
handle_ckpt(int signum, siginfo_t *info, void *ctx) {
        int retval;
        if (signum == SIGCKPT)
                printf("signum is expected value\n");
        else
                printf("signum is not expected value: %d\n", signum);

        if ((retval = ckpt_checkpoint(freezefd)) == 0) {
                printf("succesful checkpoint - exiting\n");
                exit(0);
        } else if (retval < 0) {
                perror("checkpoint failed:");
        } else if (retval == THAW_RETURN) {
                printf("we've succesfully returned from restore\n");
        } else {
                printf("unexpected return from checkpoint %d\n", retval);
                exit(1);
        }
}

where ckpt_restore is called as ckpt_restore(freezefd, THAW_RETURN) - see
test code for further details.

This is actually a very minor change to the code, but the semantics are, I
believe, much cleaner.

The latest source is available at:
http://www.fsmware.com/ckpt6.tgz

The following are the steps are required to make the mechanism complete,
not all of which I neccessarily intend to do in the near future.

1) set a default disposition for SIGCKPT and SIGCKPTEXIT for non-
checkpoint-aware applications, thus allowing them to be checkpointed/
migrated

2) write out the inode and dev_t for the application itself

3) add new version of ckpt_restore system call that will exec the file

4) at checkpoint, iterate through the file descriptor table and write out
the index, inode and dev_t for each vnode right after the point where the
signal state is stored in the checkpoint file

5) reopen files at the appropriate indexes from the inode+dev_t on restore

6) re-factor elf_coredump to take a struct file * so that one write
checkpoint state to a socket

7) re-factor new ckpt_restore function to ignore offsets so that it can
read from a socket

8) write a simple daemon to accept connections and pass the descriptor
to the new version of ckpt_restore

9) add support for multi-threaded core dumps to DragonFly (5 line change)
The only reason I put this last is because 95% of the work is in
downloading the LinuxThreads library and writing a test application.

At this point DragonFly will have support for process migration of
multi-threaded processes. If someone wants to, adding a unified pid space
(bproc) would not be hard. The above mentioned process migration support
provides *substantially* more functionality than bproc's vmadump.

If someone else wants to chip in I'd be happy to provide guidance. For me
all the fun is in figuring out how to do something. At this point the
remainder of the work is a SMOP :-).

				-Kip