just curious
Matthew Dillon
dillon at apollo.backplane.com
Mon Jul 21 11:36:16 PDT 2003
:
:> copyfile(int sfd, int dfd, off_t soffset, off_t doffset, off_t bytes)
:
:I assume this operation would end up being implemented by changing the
:state of VM buffers, yes?
"As optimally as possible", whatever that turns out to be. This type
of function would have extreme flexibility. It could copy the data
as a catch-all, but consider these cases:
* The target descriptor/object could potentially share the source's
pages until they are otherwise modified by either side (kinda like how
fork() works).
The shared data can be unshared when the target flushes the data to
backing store, or left shared, depending on what we want to accomplish.
* In a clustered operating system the backing store would not even have
to *EXIST* on the machine doing the operation.
machine 1: cp -r directory1 directory2
(directory1 resides on machine2)
(directory2 resides on machine3)
The cp program could potentially issue normal read() and write() calls.
machine 2 would provide handles backing the buffer mappings, but if
the cp program does not actually touch the buffer then the write() could
simply pass the handle to the target (machine 3), which would then
talk DIRECTLY to machine 2 to obtain the data.
result: The backing store is copied directly to the machine that needs
it, from machine 2 to machine 3.
* In a clustered operating system consider the case where the two
directories reside on machine 2:
machine 1: cp -r directory1 directory2
(directory1 and directory2 resides on machine2)
In this case the result is that we can potentially optimize the operation
to a direct SCSI->SCSI DMA op, with no file data touched by any machine's
cpu.
* A sophisticated filesystem could be made aware of possible file data
page sharing. e.g. when you copy data from one file to another there
is no reason why both files cannot share those data pages which are
the same.
In that case the cp -r operation could 'copy' gigabytes of data without
actually copying anything other then the meta-data.
Can you say "Complete filesystem copy in each jail"? I knew you could!
All of this has MAJOR implications for clustering systems because, when
fully implemented years down the road, it means that you can do a lot of
work local to a box which does not necessarily have a fast network
connection, but which nevertheless is able to cause massive amounts of
data to be moved about the cluster.
Of course this stuff can get quite complex. The base implementation will
simply be to copy, but having the flexibility to do this sort of thing
is important and passing around VM objects makes it all possible.
(And now I think Hiten will see why we shouldn't bother porting over
the zero-copy socket code from 5.x, because we will be able to do it
trivially once all of these features are put in place).
-Matt
Matthew Dillon
<dillon at xxxxxxxxxxxxx>
More information about the Kernel
mailing list