Thu Apr 1 09:58:51 PST 2004

  Enhance the pmap_kenter*() API and friends, separating out entries which
  only need invalidation on the local cpu against entries which need invalidation
  across the entire system, and provide a synchronization abstraction.
  Enhance sf_buf_alloc() and friends to allow the caller to specify whether the
  sf_buf's kernel mapping is going to be used on just the current cpu or
  whether it needs to be valid across all cpus.  This is done by maintaining
  a cpumask of known-synchronized cpus in the struct sf_buf
  Optimize sf_buf_alloc() and friends by removing both TAILQ operations in the
  critical path.  TAILQ operations to remove the sf_buf from the free queue
  are now done in a lazy fashion.  Most sf_buf operations allocate a buf,
  work on it, and free it, so why waste time moving the sf_buf off the freelist
  if we are only going to move back onto the free list a microsecond later?
  Fix a bug in sf_buf_alloc() code as it was being used by the PIPE code.
  sf_buf_alloc() was unconditionally using PCATCH in its tsleep() call, which
  is only correct when called from the sendfile() interface.
  Optimize the PIPE code to require only local cpu_invlpg()'s when mapping
  sf_buf's, greatly reducing the number of IPIs required.  On a DELL-2550,
  a pipe test which explicitly blows out the sf_buf caching by using huge
  buffers improves from 350 to 550 MBytes/sec.  However, note that buildworld
  times were not found to have changed.
  Replace the PIPE code's custom 'struct pipemapping' structure with a
  struct xio and use the XIO API functions rather then its own.
