My syncing disks problem

Sat Aug 7 23:44:15 PDT 2004

Ok, I've looked at the crash dump.  The stuck buffer is marked as 
    having an I/O in progress, but there is some weirdness that doesn't
    make sense such as b_resid being 0.  The buffer also has two softupdates
    dependancies, both inode dependancies, which are causing the buffer to
    remain marked dirty.  There should not be any inode dependancies, not
    after 20 loops and syncs!

    None of it makes much sense to me, because as far as I know the disk
    subsystem is still intact.  If I assume that the in-progress I/O is
    simply in-progress because that was what it was doing when Jean-Marc
    ctl-alt-esc'd we are left with the softupdates inode dependancies that
    should not be there.

    The only thing I can think of that would cause these dependancies to
    remain intact through 20 attempts to write them out is if the shutdown
    code is not syncing the disks the same way that the syncer syncs them
    (the syncer thread is already shutdown by this time so it is no longer
    syncing the filesystems).

    The only difference between the syncer thread and the sync() calls 
    the shutdown code makes is that the sync() calls the shutdown code makes
    are asynchronous.  And, in fact, there is a softupdates related 
    call that is not made for asynchronous filesystem syncs.

    So, here is a patch to try.  I only give it a 30% chance of working
    due to the huge number of guesses I have made above.

    Another thing I would like you to try, Jean-Marc, if the above does not
    work, is to manually sync the system by typing 'sync' on a root shell
    prompt.  Do it like five or six times, then shutdown and see if the
    problem still occurs.

					    -Matt


Index: vfs/ufs/ffs_vnops.c
===================================================================
RCS file: /cvs/src/sys/vfs/ufs/ffs_vnops.c,v
retrieving revision 1.8
diff -u -r1.8 ffs_vnops.c

--- vfs/ufs/ffs_vnops.c	19 May 2004 22:53:06 -0000	1.8
+++ vfs/ufs/ffs_vnops.c	8 Aug 2004 06:35:00 -0000
@@ -267,6 +267,15 @@
 				vprint("ffs_fsync: dirty", vp);
 #endif
 		}
+	} else {
+		/* 
+		 * Try to write out any filesystem metadata associated
+		 * with the vnode.
+		 */
+		splx(s);
+		if ((error = softdep_sync_metadata(ap)) != 0)
+			return (error);
+		s = splbio();
 	}
 	splx(s);
 	return (UFS_UPDATE(vp, wait));




--------------------------------------------------------------------------


    The buffers flags, b_flags is 0x21021024 which means:
    B_VMIO|B_WRITEINPROG|B_CLUSTEROK|B_SCANNED|B_CACHE|B_ASYNC

    The inode dependancies (not shown) each have a flags of 8209 which
    means: ONWORKLIST, IOSTARTED, DEPCOMPLETE, ATTACHED


(kgdb) print *(struct buf *)0xc0e53e80
$33 = {
  b_hash = {
    le_next = 0xc0df5e20, 
    le_prev = 0xc0e6af94
  }, 
  b_vnbufs = {
    tqe_next = 0x0, 
    tqe_prev = 0xc0e0848c
  }, 
  b_freelist = {
    tqe_next = 0x0, 
    tqe_prev = 0xc0394f78
  }, 
  b_act = {
    tqe_next = 0x0, 
    tqe_prev = 0xc96696dc
  }, 
  b_flags = 0x21021024, 
  b_qindex = 0x0, 
  b_xflags = 0x2, 
  b_lock = {
    lk_interlock = {
      t_cpu = 0xff800000, 
      t_reqcpu = 0xff800000, 
      t_unused01 = 0x0
    }, 
    lk_flags = 0x400, 
    lk_sharecount = 0x0, 
    lk_waitcount = 0x0, 
    lk_exclusivecount = 0x1, 
    lk_prio = 0x0, 
    lk_wmesg = 0xc0340005 "bufwait", 
    lk_timo = 0x0, 
    lk_lockholder = 0xfffffffe
  }, 
  b_error = 0x0, 
  b_bufsize = 0x2000, 
  b_runningbufspace = 0x2000, 
  b_bcount = 0x2000, 
  b_resid = 0x0, 
  b_dev = 0xc0cb4938, 
  b_data = 0xc20c4000 "´\201\001", 
  b_kvabase = 0xc20c4000 "´\201\001", 
  b_kvasize = 0x4000, 
  b_lblkno = 0x258c0, 
  b_blkno = 0x258c0, 
  b_offset = 0x4b18000, 
  b_iodone = 0, 
  b_iodone_chain = 0x0, 
  b_vp = 0xc9684ec0, 
  b_dirtyoff = 0x0, 
  b_dirtyend = 0x0, 
  b_pblkno = 0x1423e99, 
  b_saveaddr = 0x0, 
  b_driver1 = 0x0, 
  b_driver2 = 0x0, 
  b_caller1 = 0x0, 
  b_caller2 = 0x0, 
  b_pager = {
    pg_spc = 0x0, 
    pg_reqpage = 0x0
  }, 
  b_cluster = {
    cluster_head = {
      tqh_first = 0x0, 
      tqh_last = 0x0
    }, 
    cluster_entry = {
      tqe_next = 0x0, 
      tqe_prev = 0x0
    }
  }, 
  b_pages = {0xc09f1ac0, 0xc0a05b00, 0x0 <repeats 30 times>}, 
  b_npages = 0x2, 
  b_dep = {
    lh_first = 0xc855da50
  }, 
  b_chain = {
    parent = 0x0, 
    count = 0x0
  }
}