cvs commit: src/sys/kern vfs_cache.c vfs_syscalls.c vfs_vnops.c vfs_vopops.c src/sys/sys namecache.h stat.h

Joerg Sonnenberger joerg at britannica.bec.de
Thu Aug 25 13:43:42 PDT 2005


On Thu, Aug 25, 2005 at 01:17:55PM -0700, Matthew Dillon wrote:
>     I don't know about imon under Linux, but kqueue on the BSDs doesn't
>     even come *close* to providing the functionality needed, let alone
>     providing us with a way monitor changes across distinct invocations.
>     Using kqueue for that sort of thing a terrible idea.

Trying to reliable detect what changed between two invocations based on
any kind of transaction ID is a very bad idea, because it does add a lot
of *very* nasty issues. I'll comment on some of them later.

>     I'm not sure I understand what you mean about not reaching all
>     parent directories.  Perhaps you did not read the patch set.  It
>     most certainly DOES reach all parent directories, whether they've been
>     read or not.  That's the whole point.  It goes all the way to '/'.

It only handles vnode changes for entries which are already in the name
cache. So it is incomplete. It can't behave otherwise without keeping
the whole directory tree in memory, but that doesn't solve the problem.

>     And as far as searching directories goes... the whole point is to
>     reduce the number of directories that have to be searched to JUST the
>     portions of the hiearchy containing the modifications.  If one is
>     trying to synchronize a huge filesystem, such as many people now have,
>     it is extremely important to be able to restrict such synchronization
>     to just the elements that have changed, and to do so without having to
>     constantly monitor the entire filesystem.

Moniting filesystems is not always a good idea. Allowing any
user/program to detect activity in a subtree of the system can help to
detect or circumvent security measurements.

>     The methodology behind the transaction id assignments can make this
>     a 100% reliable operation on a *RUNNING*, *LIVE* system.  Detecting
>     in-flight changes is utterly trivial.

On a running system, it is enough to either get notification when a
certain vnode changed (kqueue modell) or when a vnode changed (imon /
dnotify model). Trying to detect in-flight changes is *not* utterly
trivial for any model, since even accurate atime is already difficult to
achieve for mmaped files. Believing that you can *reliable* backup a
system based on VOP transactions alone is therefore a dream.

>     Nesting overhead is an issue, but not a big one.  It's a very solvable
>     problem and certainly should not hold up an implementation.  The only
>     real issue occurs when someone does a write() vs someone else stat()ing
>     a directory along the parent path.  Again, very solvable and certainly
>     not a show stopper in any way.

It is a big issue, because it is not controllable. With both kqueue,
imon and dnotify it can be done *selectively* for filesystems where it
is needed and wanted. Even my somewhat small filesystems have already
over a million inodes. Just trying to read them would already create a
lot more (memory) IO just to update the various atimes.

>     Not sure what you mean by no filesystem allowing it.  It's an almost
>     trivial matter to add it to UFS.  It certainly isn't difficult.  It
>     is certainly entirely possible to correctly implement the desired
>     behavior.

As soon as you try to make it persistence you add the problem of how
applications should behave after a reboot. Since you mentioned
backups, let's just discuss that. The backup program reads a change
just after was made by a program, but before it has hit the disk. The
FSMID it sends to the backup server is therefore nowhere recorded on
disk (and doing that would involve quiet a lot of performance
penalties). Now the machine is "suddenly" restarting. Can you ensure
that the same FSMID is not reused, in which case the state of the
filesystem and the state of the backup is inconsistent? Sure, the
program can try to detect it, but that would make the entire concept of
FSMID useless.

Joerg





More information about the Commits mailing list