find stuck in state "clock"
dillon at apollo.backplane.com
Sun Aug 28 18:10:23 PDT 2005
:Simon 'corecode' Schubert wrote:
:> Ok, I tried to play a bit Sherlock Holmes (on the live system).
:tcsh (holding the vnode lock) is stuck in:
:#2 0xc017b7e0 in tsleep (ident=0xc7b127b4, flags=256, wmesg=0xc02a1a6a
: at /usr/src/sys/kern/kern_synch.c:425
:#3 0xc01930e7 in ttysleep (tp=0xc7b12768, chan=0xc7b127b4, slpflags=256,
: wmesg=0xc02a1a6a "ttywai", timo=0) at /usr/src/sys/kern/tty.c:2571
:#4 0xc0191951 in ttywait (tp=0xc7b12768) at /usr/src/sys/kern/tty.c:1278
:#5 0xc01919b2 in ttywflush (tp=0xc7b12768) at /usr/src/sys/kern/tty.c:1304
:#6 0xc0191bab in ttylclose (tp=0xc7b12768, flag=3) at
:#7 0xc0194622 in ptsclose (dev=0xc7b20530, flag=3, mode=8192,
: at /usr/src/sys/kern/tty_pty.c:246
:#8 0xc0163ad5 in cdevsw_putport (port=<incomplete type>, lmsg=0xc8dab940)
: at /usr/src/sys/kern/kern_device.c:108
:#9 0xc01795f5 in lwkt_domsg (port=0xc02ff720, msg=0xc8dab940)
: at /usr/src/sys/sys/msgport2.h:86
:#10 0xc0163c34 in dev_dclose (dev=0xc7b20530, fflag=3, devtype=8192,
:Question: why can't I kill this sleep? It's got PCATCH!
:I wonder where in this code the cache entry is actually being held
:locked... Maybe it didn't unlock the entry at some other point?
It's probably waking up the tsleep, but looping in the close function.
In anycase, I see exactly what the problem is. Basically what is going
on is a known problem with running devices through the vnode abstraction.
Most operations on vnodes lock the vnode, execute the operation, and then
unlock the vnode. For example, when you write() to a file the vnode is
exclusively locked, the VOP_WRITE() is called, and the vnode is then
unlocked. This guarentees write atomicy (to a point,anyway). The vnode
abstraction was never designed to handle operations that might block
for indefinitely-long periods of time.
But it turns out that the vnode abstraction kinda took over from the
filedescriptor vector, to the point where devices opened through
filesystems winded up going through the related vnodes in the
filesystem (e.g. the vnode for "/dev/ttypX" for I/O operations such
as open, close, read, and write.
To deal with the vnode<->device interface, SPECFS was created. SPECFS
unlocks the vnode, executes the device operation, and then relocks the
vnode (because the VFS subsystem that called it expects the vnode to
be locked on return). In particular, examine the
/usr/src/sys/vfs/specfs/spec_vnops.c file. If you take a look
at that file and search for instances of VOP_UNLOCK it should become
immediately clear to you both the terrible hack that has been in the BSDs
for so long, and why the close() in your traces (which is stuck waiting
for the TTY to finish flushing out) is deadlocking the find.
The solution is probably to put an unlock/lock around spec_close(),
just like is done in most of the other spec_*() functions,
but I'm going to have to look at it carefly to make sure that there
will be no unexpected side effects.
<dillon at xxxxxxxxxxxxx>
More information about the Bugs