device cloning on open

Toby karyadi toby.karyadi at
Thu Apr 5 14:07:02 PDT 2007

Matthew Dillon wrote:

> :Has there been any changes to device cloning support in DFBSD,
> :specifically related to this email:
> :
> :
> :I'm porting the kqemu module based on the FreeBSD version of it and it
> :works now but there can only be one qemu process using the /dev/kqemu
> :device. I guess I can just have a list of association between a process
> :and the module/device private data for each open() that's managed in the
> :module. It's not optimal but it will work until I figure out the best way
> :to keep per open data.
> :
> :In FreeBSD >= 5 the per open data is associated to a clone of the device
> :via the si_drv1 field. A clone event handler also needs to be registered
> :to do the cloning - it's kind of convoluted.
> :
> :In Linux and NetBSD, if I read the code right, the file pointer is passed
> :in to the various ops, read, ioctl, write, etc of the driver. I think
> :this makes more sense semantically. In Linux the file* is passed in,
> :while in NetBSD the module can decide call falloc and fdclone.
> :
> :I'm not sure if I want to / can take on the task of adding the per open
> :vnode as suggested by Matt, but I'd be interested in hearing what people
> :have to say about this.
> :
> :Thanks for all of the great work so far.
> :
> :Cheers,
> :Toby
>     At the moment DragonFly passes the file pointer to the VOP open
>     function, allowing the VOP open function to change elements of
>     the file pointer (for example, to install a different vnode).  The
>     file functions typically are left pointing at the specfs VFS which
>     does the translation between a vnode operation and a device operation.
>     The file pointer's fp->f_type field is left indicating a 'vnode' type.
>     This means that I/O operations running through specfs run through
>     the uncloned vnode at the moment and the actual device is picked
>     out of the vnode structure.  The vnode represents the filesystem
>     rendezvous (typically /dev/<blahblah>).  Hence why I originally
>     suggested cloning the vnode in order to support a cloned device.
>     There is another way we could clone the device, and that would be
>     to NOT retain the vnode type in the file pointer but instead to
>     create a wholely new file type and wholely new f_ops operations
>     set for the file pointer, then point f_data at a cloning structure
>     of some sort:
>     struct dev_data {
> cdev_t *dd_dev;                       /* possibly cloned device */
> struct vnode *dd_orig_vp;     /* original uncloned vnode */
>     };
>     fp->f_ops = &dev_ops;
>     fp->f_type = DTYPE_DEV;
>     fp->f_data = (pointer to allocated dev_data structure)
>     In otherwords, to create a completely new device abstraction that
>     completely bypasses the original vnode and provides storage
>     (dd_dev) for us to clone the device.  No more specfs, no more
>     indirection through the vnode operations vector.  A far more
>     direct device access mechanism that happens to also making cloning
>     trivial.
>     The dd_orig_vp field would remain only to give the new device ops
>     the ability to update the vnode's access and modified timestamps
>     (if we even care about doing that for cloned entities).
>     --
>     If you or someone would like to take on this task, I think it would
>     be an excellent (and clean) solution to a long standing problem.
> -Matt
> Matthew Dillon
> <dillon at>

Do you suppose adding a new D_MAKECLONE bit in the si_flags makes sense?
That way devices can be explicit about whether they can/want to be cloned. 

Unless VOP_OPEN is changed we'll still hit specfs, but only for spec_open().
spec_open() should be modified when an fp is passed in so that fp->f_data
is setup with the dev_data struct. If this is an initial open (hmm, how do
I figure it out?), then dd_orig_vp->v_rdev is copied to dd_dev. Otherwise
create a clone of dd_orig_vp->v_rdev (using make_sub_dev() maybe?).
spec_open() should also setup the fp->f_ops to a new set of fileops that
uses the dev_d*() functions directly. Am I correct so far?

Now, this might be a stupid question, does the higher level VFS system like
the filesystem call the VOP_OPEN of the lower layer, like of the disk
partitions? I'm just trying to figure out potential problems down stream.

Well, let me take a stab at it and see how that goes. Hmm, maybe I can use
the vkernel...


More information about the Kernel mailing list