cvs commit: src/sys/conf files src/sys/kern kern_ccms.c kern_objcache.c vfs_lock.c vfs_vnops.c src/sys/sys ccms.h kernel.h vnode.h

Tue Aug 22 23:52:40 PDT 2006

dillon      2006/08/22 23:45:40 PDT

DragonFly src repository

  Modified files:
    sys/conf             files 
    sys/kern             kern_objcache.c vfs_lock.c vfs_vnops.c 
    sys/sys              kernel.h vnode.h 
  Added files:
    sys/kern             kern_ccms.c 
    sys/sys              ccms.h 
  Log:
  Bring in the initial cut of the Cache Coherency Management System module.
  Add a sysctl kern.ccms_enable for testing.  CCMS operations are disabled by
  default.

  The comment below describes the whole enchillada.  Only basic locking has
  been implemented in this commit.

  CCMS is a duel-purpose cache management layer based around offset ranges.

  #1 - Threads on the local machine can obtain shared, exclusive, and modifying
       range locks.  These work kinda like lockf locks and the kernel will use
       them to enforce UNIX I/O atomicy rules.

  #2 - The BUF/BIO/VM system can manage the cache coherency state for offset
       ranges.  That is, Modified/Exclusive/Shared/Invalid (and two more
       advanced states).

       These cache states to not represent the state of data we have cached.
       Instead they represent the best case state of data we are allowed
       to cache within the range.

       The cache state for a single machine (i.e. no cluster), for every
       CCMS data set, would simply be 'Exclusive' or 'Modified' for the
       entire 64 bit offset range.

  The way this works in general is that the locking layer is used to enforce
  UNIX I/O atomicy rules locally and to generally control access on the local
  machine.  The cache coherency layer would maintain the cache state for
  the object's entire offset range.  The local locking layer would be used
  to prevent demotion of the underlying cache state, and modifications to the
  cache state might have the side effect of communicating with other machines
  in the cluster.

  Take a typical write().  The offset range in the file would first be locked,
  then the underlying cache coherency state would be upgraded to Modified.
  If the underlying cache state is not compatible with the desired cache
  state then communication might occur with other nodes in the cluster in
  order to gain exclusive access to the cache elements in question so they
  can be upgraded to the desired state.  Once upgraded, the range lock
  prevents downgrading until the operation completes.  This of course can
  result in a deadlock between machines and deadlocks would have to be dealt
  with.

  Likewise, if a remote machine needs to upgrade its representation of
  the cache state for a particular file it might have to communicate with
  us in order to downgrade our cache state.  If a remote machine
  needs an offset range to be Shared then we have to downgrade our
  cache state for that range to Shared or Invalid.  This might have side
  effects on us such as causing any dirty buffers or VM pages to be flushed
  to disk.  If the remote machine needs to upgrade its cache state to
  Exclusive then we have to downgrade ours to Invalid, resulting in a
  flush and discard of the related buffers and VM pages.

  Both range locks and range-based cache state is stored using a common
  structure called a CST, in a red-black tree.  All operations are
  approximately N*LOG(N).  CCMS uses a far superior algorithm to the one
  that the POSIX locking code (lockf) has to use.

  It is important to note that layer #2 cache state is fairly persistent
  while layer #1 locks tend to be ephermal.  To prevent too much
  fragmentation of the data space the cache state for adjacent elements
  may have to be actively merged (either upgraded or downgraded to match).
  The buffer cache and VM page caches are naturally fragmentory, but we
  really do not want the CCMS representation to be too fragmented.  This
  also gives us the opportunity to predispose our CCMS cache state so
  I/O operations done on the local machine are not likely to require
  communication with other hosts in the cluster.  The cache state as
  stored in CCMS is a superset of the actual buffers and VM pages cached
  on the local machine.

  Revision  Changes    Path
  1.136     +1 -0      src/sys/conf/files
  1.8       +3 -0      src/sys/kern/kern_objcache.c
  1.23      +1 -0      src/sys/kern/vfs_lock.c
  1.45      +9 -1      src/sys/kern/vfs_vnops.c
  1.22      +1 -0      src/sys/sys/kernel.h
  1.66      +4 -0      src/sys/sys/vnode.h

http://www.dragonflybsd.org/cvsweb/src/sys/conf/files.diff?r1=1.135&r2=1.136&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/kern/kern_objcache.c.diff?r1=1.7&r2=1.8&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/kern/vfs_lock.c.diff?r1=1.22&r2=1.23&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/kern/vfs_vnops.c.diff?r1=1.44&r2=1.45&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/sys/kernel.h.diff?r1=1.21&r2=1.22&f=u
http://www.dragonflybsd.org/cvsweb/src/sys/sys/vnode.h.diff?r1=1.65&r2=1.66&f=u