cvs commit: src/sys/kern src/sys/sys src/sys/checkpt
Matthew Dillon
dillon at crater.dragonflybsd.org
Thu Nov 18 05:10:25 PST 2004
dillon 2004/11/18 05:09:55 PST
DragonFly src repository
Modified files:
sys/kern imgact_elf.c kern_descrip.c
sys/sys kern_syscall.h vnode.h
sys/checkpt checkpt.c
Log:
Lots of bug fixes to the checkpointing code. The big fix is that you can
now checkpoint a program that you have checkpoint-restored. i.e. you run
program X, you checkpoint it, you checkpoint-restore X from the checkpoint,
and then you checkpoint it again. The issue here is the when a checkpointed
program is restored the checkpoint file is used to map portions of the image
of the restored program. If you then tried to checkpoint the restored image
the system would overwrite or destroy the original checkpoint file and
the new checkpoint file would have references to the old file (now
non-existant) file. Any attempt to restore the recursed checkpoint would
result in a seg-fault. That is now fixed.
* Remove the previous checkpoint file before saving the new one. If the
program we are checkpointing happens to be a checkpoint restore from the
same file then overwriting the file would wind up corrupting the
image set we are trying to save.
* When checkpointing a program that has been checkpoint-restored do not
attempt to save the file handles for the vnode representing the
checkpoint-restored program's own checkpoint file (which is a good chunk
of its backing store), because this vnode is likely to be destroyed the
moment we close the handle, since we are likely replacing the previous
checkpoint file. Instead, the backing store representing the old
checkpoint file is copied to the new one.
* Re-checkpointing a program (hitting ^E multiple times) now properly
replaces the checkpoint file.
* Properly close any file descriptors from the checkpt(1) program itself
when restoring a checkpointed program, properly replace any file descriptors
that need replacing.
* Properly replace p_comm[] when restoring a checkpoint file, so checkpointing
again saves under the same program name. 'ps' output is still wrong,
though.
TODO LIST:
* Add an iterator to the checkpoint file, accessible via kern.ckptfile,
so successive checkpoints save to a blah.ckpt.1, blah.ckpt.2, etc,
rather then always overwriting blah.ckpt (the iterator could be saved
in the proc structure).
* Add back as a 'feature' the ability for the new checkpoint file to
reference the old one. That is, each new checkpoint file would represent
a delta relative to the old one. This might be useful when checkpointing
programs with ever growing data setse so as not to have to copy the
entire contents of the program to the checkpoint file each time you want
to make a new checkpoint. It would be hell on the VM system, but it
would work.
* Add an option to checkpt(1) so you can checkpoint-restore-enter-gdb all
in one go, to be able to debug a checkpointed file more easily.
Inspired by: Brook Davis's HPC presentation. He expressed an interest in
possibly porting the checkpoint code so I figure I ought to
fix it up.
Revision Changes Path
1.24 +21 -2 src/sys/kern/imgact_elf.c
1.32 +6 -1 src/sys/kern/kern_descrip.c
1.21 +1 -0 src/sys/sys/kern_syscall.h
1.27 +1 -1 src/sys/sys/vnode.h
1.7 +67 -12 src/sys/checkpt/checkpt.c
http://www.dragonflybsd.org/cvsweb/src/sys/kern/imgact_elf.c.diff?r1=1.23&r2=1.24&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/kern/kern_descrip.c.diff?r1=1.31&r2=1.32&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/sys/kern_syscall.h.diff?r1=1.20&r2=1.21&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/sys/vnode.h.diff?r1=1.26&r2=1.27&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/checkpt/checkpt.c.diff?r1=1.6&r2=1.7&f=u
More information about the Commits
mailing list