just curious
Hiten Pandya
hmp at FreeBSD.ORG
Tue Jul 22 21:18:36 PDT 2003
On Mon, Jul 21, 2003 at 11:35:12AM -0700, Matthew Dillon wrote:
>
> :
> :> copyfile(int sfd, int dfd, off_t soffset, off_t doffset, off_t bytes)
> :
> :I assume this operation would end up being implemented by changing the
> :state of VM buffers, yes?
>
> "As optimally as possible", whatever that turns out to be. This type
> of function would have extreme flexibility. It could copy the data
> as a catch-all, but consider these cases:
>
> * The target descriptor/object could potentially share the source's
> pages until they are otherwise modified by either side (kinda like how
> fork() works).
>
> The shared data can be unshared when the target flushes the data to
> backing store, or left shared, depending on what we want to accomplish.
>
> * In a clustered operating system the backing store would not even have
> to *EXIST* on the machine doing the operation.
>
> machine 1: cp -r directory1 directory2
> (directory1 resides on machine2)
> (directory2 resides on machine3)
>
> The cp program could potentially issue normal read() and write() calls.
> machine 2 would provide handles backing the buffer mappings, but if
> the cp program does not actually touch the buffer then the write() could
> simply pass the handle to the target (machine 3), which would then
> talk DIRECTLY to machine 2 to obtain the data.
>
> result: The backing store is copied directly to the machine that needs
> it, from machine 2 to machine 3.
>
> * In a clustered operating system consider the case where the two
> directories reside on machine 2:
>
> machine 1: cp -r directory1 directory2
> (directory1 and directory2 resides on machine2)
>
> In this case the result is that we can potentially optimize the operation
> to a direct SCSI->SCSI DMA op, with no file data touched by any machine's
> cpu.
>
> * A sophisticated filesystem could be made aware of possible file data
> page sharing. e.g. when you copy data from one file to another there
> is no reason why both files cannot share those data pages which are
> the same.
>
> In that case the cp -r operation could 'copy' gigabytes of data without
> actually copying anything other then the meta-data.
>
> Can you say "Complete filesystem copy in each jail"? I knew you could!
>
> All of this has MAJOR implications for clustering systems because, when
> fully implemented years down the road, it means that you can do a lot of
> work local to a box which does not necessarily have a fast network
> connection, but which nevertheless is able to cause massive amounts of
> data to be moved about the cluster.
>
> Of course this stuff can get quite complex. The base implementation will
> simply be to copy, but having the flexibility to do this sort of thing
> is important and passing around VM objects makes it all possible.
>
> (And now I think Hiten will see why we shouldn't bother porting over
> the zero-copy socket code from 5.x, because we will be able to do it
> trivially once all of these features are put in place).
Crystal Clear! :-)
--
Hiten Pandya BSD UNIX/DragonFly Enthusiast
hmp at xxxxxxxx FreeBSD Team Member.
Visit: http://rtp.freebsd.org/~hmp/
More information about the Kernel
mailing list