just curious

Hiten Pandya hmp at FreeBSD.ORG
Tue Jul 22 21:18:36 PDT 2003


On Mon, Jul 21, 2003 at 11:35:12AM -0700, Matthew Dillon wrote:
> 
> :
> :> 	copyfile(int sfd, int dfd, off_t soffset, off_t doffset, off_t bytes)
> :
> :I assume this operation would end up being implemented by changing the 
> :state of VM buffers, yes?
> 
>     "As optimally as possible", whatever that turns out to be.  This type
>     of function would have extreme flexibility.  It could copy the data
>     as a catch-all, but consider these cases:
> 
>     * The target descriptor/object could potentially share the source's
>       pages until they are otherwise modified by either side (kinda like how
>       fork() works).
> 
>       The shared data can be unshared when the target flushes the data to
>       backing store, or left shared, depending on what we want to accomplish.
> 
>     * In a clustered operating system the backing store would not even have
>       to *EXIST* on the machine doing the operation.
> 
>       machine 1: cp -r directory1 directory2
> 	 (directory1 resides on machine2)
> 	 (directory2 resides on machine3)
> 
>       The cp program could potentially issue normal read() and write() calls.
>       machine 2 would provide handles backing the buffer mappings, but if
>       the cp program does not actually touch the buffer then the write() could
>       simply pass the handle to the target (machine 3), which would then
>       talk DIRECTLY to machine 2 to obtain the data.
> 
>       result: The backing store is copied directly to the machine that needs
> 	      it, from machine 2 to machine 3.
> 
>     * In a clustered operating system consider the case where the two 
>       directories reside on machine 2:
> 
>       machine 1: cp -r directory1 directory2
> 	 (directory1 and directory2 resides on machine2)
> 
>       In this case the result is that we can potentially optimize the operation
>       to a direct SCSI->SCSI DMA op, with no file data touched by any machine's
>       cpu.
> 
>     * A sophisticated filesystem could be made aware of possible file data
>       page sharing.  e.g. when you copy data from one file to another there
>       is no reason why both files cannot share those data pages which are
>       the same.
> 
>       In that case the cp -r operation could 'copy' gigabytes of data without
>       actually copying anything other then the meta-data.  
> 
>       Can you say "Complete filesystem copy in each jail"?  I knew you could!
> 
>     All of this has MAJOR implications for clustering systems because, when
>     fully implemented years down the road, it means that you can do a lot of
>     work local to a box which does not necessarily have a fast network 
>     connection, but which nevertheless is able to cause massive amounts of
>     data to be moved about the cluster.
> 
>     Of course this stuff can get quite complex.  The base implementation will
>     simply be to copy, but having the flexibility to do this sort of thing
>     is important and passing around VM objects makes it all possible.
> 
>     (And now I think Hiten will see why we shouldn't bother porting over
>     the zero-copy socket code from 5.x, because we will be able to do it
>     trivially once all of these features are put in place).


	Crystal Clear! :-)

-- 
Hiten Pandya			BSD UNIX/DragonFly Enthusiast
hmp at xxxxxxxx			FreeBSD Team Member.
Visit:				http://rtp.freebsd.org/~hmp/





More information about the Kernel mailing list