postfix causes hangs
Rob Schmuloff
rrschoolie at yahoo.com
Sat Jun 26 20:03:47 PDT 2004
Hello Matt,
First apologies for the poor formatting of this
email...
I have some interesting information regarding the
postfix problem. It seems to affect a few of the
postfix daemons: smtpd, local, and cleanup. The
debugging output seems to indicate a race condition
between two processes calling flock and requesting an
exclusive lock on the same file. When the machine
wedges the network stack still seems to be working (
i.e. TCP connect, and pings), but the console is
locked up and nothing seems to be running in userland.
I can reproduce this probelm by receiving email from
the freebsd-current mailing list ( not sure why other
than the high volume of mail, and their server is
running postfix too).
The debugging output from the lockf code (hand
copied):
pid 5831 (cleanup) lf_destroy_range:
-2401050962867404578..-2401050962867404578
pid 5831 (cleanup) lf_create_range:
0..9223372036854775807
pid 5832 (cleanup) lf_destroy_range:
0..9223372036854775807
pid 5832 (cleanup) lockf 0xd7c22bd4
0..9223372036854775807 type exclusive owned by
-1
blocked locks 9223372036854775807 type exclusive
waiting on 0xc0861ec0
pid 5832 (cleanup) lf_destroy_range:
-2401050962867404578..-2401050962867404578
pid 5832 (cleanup) lf_create_range:
0..9223372036854775807
pid 5831 (cleanup) lf_destroy_range:
0..9223372036854775807
pid 5831 (cleanup) lockf 0xd7c22bd4
0..9223372036854775807 type exclusive owned by
-1
blocked locks 9223372036854775807 type exclusive
waiting on 0xc0861ec0
I wrote a quick prog that seems to replicate the
problem if run with stdout/stderr redirected to
/dev/null ( or just comment out the printf's)
/*---------------------------------------------------*/
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/file.h>
#include <sys/types.h>
int main()
{
int fd,status;
int ops[4]={ LOCK_SH, LOCK_EX, LOCK_SH|LOCK_NB,
LOCK_EX|LOCK_NB};
char *opstr[4] ={"LOCK_SH", "LOCK_EX",
"LOCK_EX|LOCK_NB", "LOCK_SH|LOCK_NB"};
pid_t pid, i;
fork();
fork();
pid=getpid();
/* i = pid % 2; */
i = 1;
printf("pid %d i = %d\n",pid,i);
for (;;){
fd = open("testfile",O_RDWR|O_CREAT);
while ( status=flock(fd, ops[i]) )
usleep( rand()/20000);
printf("PID %d got %s lock --
%d\n",pid,opstr[i],status);
usleep(rand()/20000);
status=flock(fd, LOCK_UN);
printf("PID %d released lock --
%d\n",pid,status);
usleep(rand()/20000);
close (fd);
}
}
--- Matthew Dillon <dillon at xxxxxxxxxxxxxxxxxxxx>
wrote:
> Rob, I have committed a change to
> kern/kern_lockf.c that puts a very
> short wait in the lockf retry loop in an attempt
> to prevent the system
> from locking up when it gets into this livelock.
>
> I don't think this will actually fix the problem
> (or at least I hope
> it doesn't). I am instead hoping that it will
> be possible to ktrace
> the processes involved and/or otherwise track
> the problem down when
> it occurs without the whole machine going down.
>
> -Matt
>
>
> :Hello,
> :
> : I've experienced periodic hangs on my system for
> :sice mid-May. Lately, my terminal is unresponsive
> as
> :soon as I start postfix and receive incoming mail.
> :When I drop into DDB, the stack trace shows:
> :
> :scgetc()
> :sckbdevent()
> :atkbd_intr()
> :atkbd_isa_intr()
> :intr_mux()
> :ithread_handler()
> :(Perhaps this is ctl-alt-esc trace)
> :
> :This is with a kernel/world built June 23rd. I
> :*think* the problem is somewhere in the lockf code
> :because I glanced at 'ps' from DDB. Also, I'm using
> :procmail for local delivery, so that's also a
> :possibility. Sorry for the sparse information.
> I'm
> :trying to get more data for you..
> :
> :Thanks,
> :
> :Rob
>
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail Attachment:
tt.c
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bin00000.bin
Type: application/octet-stream
Size: 763 bytes
Desc: "Description: tt.c"
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20040626/87a4fdc5/attachment-0020.bin>
More information about the Kernel
mailing list